<<

On the Visualisation of Large User Models in Web Based Systems

James B. Uther

S I O D T E REáME áMUT A N M S E á E A D

A Thesis Submitted for the Degree of Doctor of Philosophy

The University of Sydney

November 2001 ii

c James B. Uther 2001 iii

ABSTRACT

This thesis describes the creation and refinement of a new tool for visualising large user models, that can be made available to users on the World Wide Web.

User models are the set of beliefs a (software) system holds about a user. User-adapted applications, and increasingly, web sites, use a user model to help the interaction with a user. As these models start to contain more personal and sensitive information, and affect the experience of the software user, it becomes important for the user to be able to inspect and control that data.

This thesis presents work that aims to help users see an overview of the data and beliefs contained in their user model. While there has been work on scrutable user models that support exploration and user control [Kay99, ZRNG99], they have been focused on the inspection of individual model components. This thesis helps users quickly search for interesting features in models of several hundreds of components. This thesis presents the design and implementation of three iterations of the tool, and user tests of each design. The final implementation is evaluated in trial with more than 50 users.

Much recent work on user-adapted systems has involved adaptive hypertext and ser- vices on the World Wide Web. An important feature of this work is its ability to work as a natural part of a web site. Furthermore, the user model format presented here leverages an Internet standard for complex metadata, allowing for inter-operation with a broad range of web services. iv

ACKNOWLEDGEMENTS

This thesis has been in gestation for longer than most. Therefore, the list of people to whom I owe thanks is nearly endless. I’ll try and pick out some examples here:

During the course of this PhD Maria and I have seen a lot of what life has to offer. Throughout it all she has been at my side with encouragement, advice, friendship and patience. I couldn’t have kept going without her, and I know of no words that can truly express my gratitude for her support. She truly is my best friend. By the time you read this she’ll also be my wife!

My supervisor A/Prof Judy Kay has, once again, shown patience and courage above and beyond the call of any academic duty. A/Prof Alan Fekete, Prof. Ann Sefton and A/Prof Bob Kummerfeld also did their share of keeping me somewhat on track through some difficult times. And of course my family were always there when I needed help. They always are, which is wonderful.

For the most part I was not enrolled full time. I’d like to thank my various employers during this period for their patience in allowing me to be distracted by, and even supporting, this work, which was not always related to what they really wanted me to do. The Faculty of Medicine at the University of Sydney were always more than generous offering me space and resources to apply my work there to my research for this thesis. In particular Prof. Ann Sefton (again), Dr Jill Gordon, Mr Stewart Barnet, Prof. Simon Carlile and Mr Wayne Davies were always helpful and encouraging. My cow-orkers, Vicki & Chris, should also be mentioned for their stimulating political analysis and coffee.

My more recent employers, F-Secure, have also been quite understanding in these last few months of writing. My productivity at work has dropped alarmingly and I’ve been trying to tell anyone who’ll listen that virus scanners are not complete without an integrated scrutable user model. With any luck I’ll get better now (although I’m convinced that IPSec configuration tools could well do with some user modelling help ...). v

DEDICATION

This thesis is dedicated to Maria. Love is putting up with several years of ruined week- ends and evenings, and still encouraging me to ruin more!

Thank you.

Contents

1 Introduction 1

1.1 Scrutable User Models ...... 5

1.1.1 User Model Servers ...... 7

1.1.2 Differing Interpretations of Data ...... 8

1.2 Summary and Thesis Overview ...... 8

2 Background 11

2.1 User Models ...... 11

2.1.1 Scrutable User Models ...... 12

2.1.2 Visualisation of User Models ...... 14

2.2 Information Visualisation ...... 19

2.2.1 Visual Properties and Coding ...... 20

2.2.2 Dynamic Queries ...... 21

2.2.3 Focus + Context and Distortion ...... 25

2.2.4 Animation ...... 30

2.3 The World Wide Web ...... 31

2.3.1 Data ...... 31

2.3.2 Transfer ...... 34

2.3.3 Names ...... 35

2.3.4 Summary ...... 36

2.4 Knowledge Representation on the Web ...... 36

2.4.1 Metadata ...... 36

2.4.2 Ontologies ...... 37

2.4.3 Resource Description Format ...... 38

2.5 The Java Programming Environment ...... 43

vii viii CONTENTS

3 Domains and Models 45 3.1 Medical Knowledge ...... 45 3.1.1 Learning Topics ...... 46 3.1.2 Online Assessment ...... 46 3.1.3 The Generated Online Assessment Model ...... 51 3.2 The Movie Preferences Domain ...... 54

4 Design Constraints and Early Design Experiments 59 4.1 Design Constraints for a Visualisation of Large User Models ...... 59 4.2 Description of a VlUM Model ...... 61 4.2.1 The Component Data ...... 61 4.2.2 The Graph ...... 62 4.3 Version One ...... 62 4.3.1 Display ...... 63 4.3.2 Implementation ...... 65 4.3.3 First Formative Evaluation ...... 69 4.4 Version Two ...... 71 4.4.1 Second Evaluation ...... 71

5 VlUM 2.0 77 5.1 Appearances ...... 77 5.1.1 Menu Bar ...... 79 5.1.2 Slider ...... 80 5.1.3 The Display ...... 81 5.1.4 Selection ...... 84 5.1.5 Status Bar ...... 85 5.1.6 Experiments in Anti-Aliasing and Transparency ...... 85 5.2 File Format ...... 86 5.2.1 Startup ...... 88 5.3 A Software Environment for Managing and Monitoring Experiments .... 89 5.3.1 Asking Questions ...... 90 5.3.2 Logging ...... 94 5.3.3 Marking Questions ...... 96 5.3.4 Benefits ...... 97 CONTENTS ix

6 Evaluation 99

6.1Aim...... 99

6.2 Method ...... 100

6.3 Analysis ...... 104

6.4 Results ...... 106

6.4.1 Time to Answer ...... 106

6.4.2 Percentage of Correct Answers ...... 109

6.4.3 Steps Taken to Answer ...... 109

6.4.4 Results by Task ...... 111

6.4.5 Effect of Participant Age ...... 122

6.4.6 Other Participant Differences ...... 125

6.5 Summary ...... 128

7 Conclusions 131

7.1 Future Directions ...... 132

7.1.1 User Model Representation ...... 132

7.1.2 Visualisation ...... 133

7.1.3 Further Uses ...... 135

7.2 Contributions ...... 137

A Learning Topic Example 139

B Online Assessment User Surveys 141

B.1 First User Survey ...... 141

B.1.1 Mail to Students ...... 141

B.1.2 Student Responses ...... 142

B.2 Second User Survey ...... 145

C Movie Domain Question File 149

D Graphs from the Evaluation 159

D.1 Tutorial ...... 160

D.2 Experiment ...... 165

E RDF File of a Movie Recommendation Model 183 x CONTENTS

F RDF File of an Online Assessment Model 185

G Movie Experiment Logs 187

G.1 Tasks Done Well ...... 187

G.2 An Average Session ...... 199

G.3 A Poor Session ...... 209

H Worked Examples of Tasks 221

I Dot Graphs 229 List of Figures

2.1 The QV tool showing a user model for a user of the SAM text editor. Image courtesy of J. Kay ...... 15

2.2 The QV tool showing a user model for a user of the SAM text editor with all branches expanded. Image courtesy of J. Kay ...... 16

2.3 An example of a VISNET visualisation of a Bayesian Belief Network.Figure from [ZRNG99]...... 17

2.4 The ‘viewer control panel’ from a viewer in TAGUS. Image from [PS95]. .. 18

2.5 The ‘partition editor’ from the BGP-MS system. Image from [KP95]. .... 19

2.6 A screenshot of SEESOFT visualisation. In this example, darker lines have been executed more times in an execution of the program...... 21

2.7 A screenshot of a dynamic query tool for exploring the periodic table of ele- ments. Image courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/...... 22

2.8 Dynamic query tools often use a double slider to allow the user to specify a range of values...... 22

2.9 A screenshot of HOMEFINDER visualisation and dynamic query tool. Im- age courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/ ..... 23

2.10 A screenshot of FILMFINDER visualisation and dynamic query tool. Image courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/...... 23

2.11 A screenshot of SPOTFIRE visualisation and dynamic query tool exploring an abstract data set of biochemical and pharmaceutical companies...... 24

2.12 ...... 25

2.13 ...... 26

2.14 The Inxight STARVIEWER hyperbolic browser showing a tree of recipes. . . 27

2.15 A screenshot of THE PERSPECTIVE WALL...... 27

2.16 A screenshot of THE TABLE LENS analysing mutual fund performance. . . 28

xi xii LIST OF FIGURES

2.17 A screenshot of a hyperbolic space graph layout. Image courtesy of T. Munszner...... 29

2.18 DEXTER showing few JBW’s...... 30

2.19 Some rendered HTML ...... 34

2.20 A Simple RDF Statement ...... 39

2.21 A less simple RDF graph...... 39

2.22 A relatively complex RDF graph...... 40

3.1 A multiple true/false question in Online Assessment ...... 47

3.2 Answer in Online Assessment ...... 48

3.3 Question Statistics in the Online Assessment...... 48

3.4 Feedback submission in the Online Assessment ...... 49

3.5 Previous Feedback page in the Online Assessment ...... 50

3.6 End of Session Statistics in the Online Assessment ...... 50

3.7 IMDB entry for ...... 54

3.8 IMDB entry for Audrey Hepburn ...... 55

3.9 Munszner’s H3 system when displaying an inhomogeneous network. Im- age courtesy of T. Munszner...... 57

4.1 First implementation of the visualisation ...... 64

4.2 Stretching in the first implementation ...... 65

4.3 First RDF Schema ...... 66

4.4 First implementation of selection ...... 67

4.5 ...... 69

4.6 Second version of the visualisation...... 70

4.7 Second implementation of selection ...... 70

4.8 Second RDF Schema ...... 72

4.9 A cluttered display ...... 72

4.10 Third implementation of selection ...... 73

5.1 The visualisation in VlUM 2.0...... 78

5.2 VlUM 2.0 in a web browser, showing the currently selected movie...... 79

5.3 VlUM 2.0 in a web browser, showing the DISPLAY menu for the Online Assessment model. The Learning Topic shown is mocked up...... 80 LIST OF FIGURES xiii

5.4 VlUM 2.0 in a web browser, showing evidence for the currently selected Learning Topic. The evidence in this image is mocked up...... 81

5.5 Effect of using slider in VlUM 2.0 ...... 82

5.6 Effect of selecting an component in VlUM 2.0...... 83

5.7 The component under the mouse is Effect of virus on host cells. A compo- nent becomes white when the mouse pointer is moved over it...... 83

5.8 Selection in VlUM 2.0 ...... 84

5.9 Problems with the selection algorithm in VlUM 2.0 ...... 84

5.10 Comparisons between drawing methods ...... 86

5.11 Final RDF Entry ...... 87

5.12 RDF from movie domain ...... 88

5.13 XML configuration file ...... 89

5.14 A normal question...... 91

5.15 An age Question ...... 91

5.16 A radio Question ...... 92

5.17 A comment Question ...... 92

5.18 Examples from the XML Question file used to run user tests on VlUM 2.0. . 93

5.19 A log from a VlUM 2.0 session ...... 95

6.1 Screenshots of an EASY task...... 102

6.2 Screenshots of a COUSIN task...... 103

6.3 Initial appearance of VlUM 2.0 for each data set size...... 105

6.4 Plot of times taken to complete all the tasks in the tutorial and the experiment.107

6.5 Plot of times taken to complete the tutorial in the experiment...... 107

6.6 Plot of times taken to complete the main part of the experiment...... 108

6.7 Time to answer task EASY...... 109

6.8 Percentage of participants giving correct answers for task EASY...... 110

6.9 Steps taken to correctly complete task EASY...... 110

6.10 Comparison between time to answer EASY in the tutorial and experiment. . 112

6.11 Comparison between time to answer task HARD in the tutorial and the ex- periment...... 113

6.12 Comparison between the time to answer task CERT in the tutorial and ex- periment...... 115 xiv LIST OF FIGURES

6.13 Comparison between the time to answer task REC in the tutorial and exper- iment...... 117

6.14 Comparison between the time to answer task CERTEASY in the tutorial and experiment...... 118

6.15 Comparison between the time to answer task CERTREC in the tutorial and the experiment...... 121

6.16 Average time to answer...... 123

6.17 Average number of steps to complete a task...... 123

6.18 Plot of time to complete the experiment compared with the age of the par- ticipant...... 124

6.19 Plot of marks compared to participant age...... 124

6.20 Plot of average mark achieved by people below given ages...... 125

6.21 Scatterplot of the difference in times between the tutorial and the experiment.126

6.22 Scatterplot of the time to complete the tutorial on the y axis and the time to complete the experiment tasks on the x...... 127

6.23 Scatterplot of the difference in marks between tutorial and main experiment for each participant...... 127

B.1 I use the Online Assessment system on my own (0 = often,5=never). . . . 145

B.2 I use the Online Assessment system with others (0 = all of the time, 4 = never)...... 146

B.3 The format of questions suits my self assessment and learning style (0 = always,4=never)...... 146

B.4 On average I spend the following amount of time on any one ‘session’ (0 ¿ 30 minutes,5Á10minutes)...... 147

B.5 The questions are relevant to the probmes (0 = never, 4 = often)...... 147

B.6 The explanations of why answers are right or wrong are useful for my learn- ing (0 = never, 4 = often)...... 148

B.7 If I have a comment or query about a question, I use the feedback button (0 = often,4=never)...... 148

D.1 Time to answer task ‘easy’ in tutorial. Times include the reading time for the task...... 160

D.2 Steps taken to correctly complete task ‘easy’ in tutorial...... 160

D.3 Time to answer task ‘hard’ in tutorial...... 161

D.4 Steps taken to correctly complete task ‘hard’ in tutorial...... 161 LIST OF FIGURES xv

D.5 Time to answer task ‘rec’ in tutorial...... 161

D.6 Steps taken to correctly complete task ‘rec’ in tutorial...... 162

D.7 Time to answer task ‘cert’ in tutorial...... 162

D.8 Steps taken to correctly complete task ‘cert’ in tutorial...... 162

D.9 Time to answer task ‘certeasy’ in tutorial...... 163

D.10 Steps taken to correctly complete task ‘certeasy’ in tutorial...... 163

D.11 Time to answer task ‘certrec’ in tutorial...... 163

D.12 Steps taken to correctly complete task ‘certrec’ in tutorial...... 164

D.13 Time to answer task ‘easy’...... 165

D.14 Percentage of correct answers for task ‘easy’...... 165

D.15 Steps taken to correctly complete task ‘easy’...... 166

D.16 Time to answer task ‘cousin’...... 166

D.17 Percentage of correct answers for task ‘cousin’...... 167

D.18 Steps taken to correctly complete task ‘cousin’...... 167

D.19 Time to answer task ‘hard’...... 168

D.20 Percentage of correct answers for task ‘hard’...... 168

D.21 Steps taken to correctly complete task ‘hard’...... 169

D.22 Time to answer task ‘recslider’...... 169

D.23 Percentage of correct answers for task ‘recslider’...... 170

D.24 Steps taken to correctly complete task ‘recslider’...... 170

D.25 Time to answer task ‘cert’...... 171

D.26 Percentage of correct answers for task ‘cert’...... 171

D.27 Steps taken to correctly complete task ‘cert’...... 172

D.28 Time to answer task ‘rec’...... 172

D.29 Percentage of correct answers for task ‘rec’...... 173

D.30 Steps taken to correctly complete task ‘rec’...... 173

D.31 Time to answer task ‘certeasy’...... 174

D.32 Percentage of correct answers for task ‘certeasy’...... 174

D.33 Steps taken to correctly complete task ‘certeasy’...... 175

D.34 Time to answer task ‘certhard’...... 175

D.35 Percentage of correct answers for task ‘certhard’...... 176 xvi LIST OF FIGURES

D.36 Steps taken to correctly complete task ‘certhard’...... 176

D.37 Time to answer task ‘hardrec’...... 177

D.38 Percentage of correct answers for task ‘hardrec’...... 177

D.39 Steps taken to correctly complete task ‘hardrec’...... 178

D.40 Percentage of correct answers for task ‘hardrec’, taking Roman Holiday as a correct answer in all data sets...... 178

D.41 Time to answer task ‘easyrec’...... 179

D.42 Percentage of correct answers for task ‘easyrec’...... 179

D.43 Steps taken to correctly complete task ‘easyrec’...... 180

D.44 Time to answer task ‘certrec’...... 180

D.45 Percentage of correct answers for task ‘certrec’...... 181

D.46 Steps taken to correctly complete task ‘certrec’...... 181

D.47 Average percent correct...... 182

H.1 A HARD task...... 221

H.2 A REC task...... 222

H.3 A CERT task...... 223

H.4 A CERTEASY task...... 224

H.5 A CERTHARD task...... 225

H.6 A HARDREC task...... 225

H.7 A RECSLIDER task...... 226

H.8 A EASYREC task...... 226

H.9 A CERTREC task...... 227

H.10 An example of display stretching...... 227

H.11 Quick stretching example...... 228

I.1 Graph of the 100 movie data set drawn with DOT...... 230

I.2 Graph of the 300 movie data set drawn with DOT ...... 231 Chapter 1

Introduction

This thesis describes the creation and refinement of a new tool, VlUM, for visualising large user models, via the World Wide Web.

A user model is the set of the system’s beliefs about a user. These models are enabling the increasing personalisation of software, particularly on the Internet where commercial web sites have discovered the importance of ‘knowing’ their regular visitors [KKP01]. The user model is the set of information and beliefs that is used to personalise the web site, and often contains personal information such as sex, age and credit status. It is becoming vital for users to be able to see and control the personal information that the user model is based upon.

This need is catching the attention of governments. For example, Article 10 of the European Community Directive on Data Privacy [PtCotEU95] upholds a citizen’s ‘right of access to and the right to rectify the data’, while Article 12 requires that the information shown to an enquiring citizen be ‘an intelligible form of the data undergoing processing and of any available information as to their source’. The spirit of this legislation is the view that personal data belongs to the person concerned, and should be truly accessible and controlled by that person. It is likely that future system builders will become responsible for helping the user to understand, access, and correct personal information stored on their system.

The ability to control personal information is a neglected area of current personalisa- tion research and practise. The few user models that support this inspection and control by the owner of the information are here termed scrutable user models after the work of Kay [Kay99].

To date, the work on scrutable user models [CK94, CKRT95, ZRNG99, Kay99] has been limited to examination of the details of small models with less than 100 ‘beliefs’, or ‘components’. Kay argues that to be truly scrutable, a user must be able to get an overview of the model, however previous work concentrates on supporting detailed scrutiny. The importance of an overview becomes apparent when considering a user model held by a

1 2 Introduction

large commercial web site. The user model informing the personalisation of the Internet ∗ bookstore AMAZON.COM could potentially represent the user’s interest in books:

• ‘browsed’ on the site;

• placed on gift lists;

• bought;

• reviewed;

• bought by other people with similar buying patterns.

This would quickly reach several hundred beliefs.

The size of the model grows further when we consider sharing the model between applications. This reuse can enable a better user experience by helping new applications to instantly ‘know’ the user. However, collecting beliefs from multiple sources multiplies the size of the model yet again.

As the number of beliefs in the model grows, it is increasingly difficult for the user to get an overview of the model, find useful data, and even more difficult for us to find patterns or surprising data. The current methods for displaying models fail, and it becomes necessary to display the model in a way that helps the user understand and use the model. The practise of visual display of often large bodies of information is called visualisation [Ber81, HMM99].

This thesis is devoted to this first and critical part of user control of their model, its overview visualisation. Attempts to allow scrutinising and correction of a model will fail if the user cannot get an overview of the model, or find interesting and surprising, information in the model. In particular, the tool must allow a user to find:

1. what the model holds to be true about the user;

2. what the model holds to be false;

3. how strongly the model holds the belief;

4. how certain the system is about the belief;

5. how the beliefs are related, if there is a relationship between them;

These assume a user model with boolean component beliefs, which is not always the case, although boolean belief values can be used effectively. Since the model is likely to be large, the tool must help the user to find this information by allowing the user to:

• get an overview of the whole model

• get a clearer overview of a subset of related beliefs in the model ∗http://www.amazon.com/ 3

• adjust the sensitivity of the display so that the user can decide what strength should be treated as true.

Some Examples

Some examples may help illustrate these concepts. Two examples are offered here, one from each of the domains modelled in this thesis. The first is of Jane, a subscriber to a movie recommendation service who wishes to investigate the services’ model of her. The second considers Tom, a medical student, who uses the user model generated by a quiz system to guide his exam preparation.

Jane receives some e-mail from a movie recommendation service to which she sub- scribes. Out of curiosity, she would like to know more about what the recommender be- lieves about her preferences. In particular, she would like to see what other movies it thinks she would like, and how strongly it believes its recommendations. The recommendation service web site includes a page which uses VlUM to show the model of the user. Jane goes to this page to view her model. She can immediately see some interesting information:

• The service strongly believes that she very much likes the movie The Matrix. Jane enquires about why it believes this (by using a menu on the tool). The service says that ‘on 10-June-1999 you told me that you saw this movie and very much liked it’.

• The service believes somewhat strongly that Jane will like the movie Circle of Friends, a movie she has never seen. On inquiry, the system says that ‘Most users of your age and sex who have seen that movie have liked it a lot. On the other hand, you don’t always like movies that your peers like, so the recommendation is tentative’.

• Returning to The Matrix, Jane decides to find movies made by the same directors. The model supplied includes information about what movies are related based on similarity in director and cast, so VlUM enables Jane to easily find that Bound (1996) was also written and directed by the Wachowski Brothers. On reading the description of Bound, Jane decides that she wouldn’t like it, and fills in a rating box in the web page to say so. This information is added to her user model as somewhat certain evidence that she would not like the movie (she says so, but she hasn’t actually seen it).

• Jane would like to find all movies that the system feels she would like very much, regardless of how much evidence it has for that belief. This is quite a difficult task, because there are some 700 movie titles packed into the side of a browser window and many are rated positively, so picking them out one by one is nearly impossible. Instead, she can tell VlUM to highlight titles that it believes she will really like. She does this by sliding by sliding an interactive control to a setting which makes only the highest recommendations stand out. On this setting, the VlUM display shows only the most highly rated movies in green, and all others in red.

Meanwhile, Tom, a medical student, is revising for his end of year exams. Throughout his time as a medical student Tom has been making regular use of an on-line quiz system. 4 Introduction

This system contains questions and answers written by the lecturers, and allows students in the course to practise and assess their mastery of the topics as they go along. Luckily for Tom the system has also been tracking his answers to the questions (with his knowledge and consent, of course). It has built a model of his strengths and weaknesses in the course, which Tom can now use to guide his revision.

• Tom goes to the page on the course web site that has the VlUM display for his knowl- edge of medicine. This shows him all the topics in the medical course, and also shows the system’s assessment of how well he knows each topic, and how strong that belief is.

• He decides to aim for a mark of at least 75% in all topics. He instructs VlUM to treat all topics as failed if they have earned a score of less than 75%. This has the effect of instructing VlUM to show only the topic with scores over 75% in green, indicating the topic has been passed at that level. All scores worse than this will be in red, indicating that he has performed worse than the set mark. It is these red topics that Tom wants to focus on since they are the areas where he is weak.

• One topic in particular, Growth and nutrition in indigenous children, is very red. Tom selects this topic and tries some questions. Unfortunately he finds he is simply out of his depth at this stage. He needs to plan his study.

Ð Tom decides that the best way to get a grasp on Growth and nutrition in in- digenous children is to first study a related topic that he knows more about. He can use VlUM to find such a topic, because the model in use by VlUM contains relationships between topics based on a measure of their similarity. VlUM can therefore show the ‘peers’ of any selected topic. Tom selects Growth and nu- trition in indigenous children in VlUM and all related topics are highlighted so that they stand out. Ð Tom can now select the highlighted title about which he knows most. Of course, VlUM has the topic it believes he knows most about in bright green, so this is easy. He selects the topic, Acute respiratory infection in indigenous children Ð Tom is already quite knowledgeable about respiratory infection, so he quickly masters this topic. The extra knowledge he picks up on indigenous health then helps him go back and master Growth and nutrition in indigenous children.

• Tom finds another red topic, Anatomy of the G.I.T. He remembers trying questions from this topic before it had been covered in class, and (as can be expected) perform- ing badly. It has since been covered in class, so Tom goes and tries the questions again. This time he does better. His new results become more evidence in the user model, this time showing that he does know the topic. Consequently, the title of that topic as displayed by VlUM becomes less red, and then increasingly green.

• As a medical student, Tom is quite competitive, and wishes to do better than average in his cohort. VlUM can compare two models, and allows Tom to see his model 1.1 Scrutable User Models 5

compared to an average of his cohort. This shows topics he is doing better than his cohort’s average in green, and others in red. By using an interactive control, he can change the mark at which VlUM changes the topic colour from red to green. Tom wants to do at least 10% better than average, so he slides the control up a bit. The display now shows all topics in which he is doing 10% better than the cohort average in green, and the rest in red.

• As a final check on his preparation, Tom wants to see how his knowledge compares to what the lecturers think a student should know at this stage. As it happens, VlUM can find a model of an ‘average student’ prepared by the lecturers. It can then compare that model with Tom’s. He can see topics in which he is not performing to the expected standard in red, and others in green. Again, if he wishes to perform better than the expected standard, he can ask VlUM to change the point at which topics change from red to green.

After a few weeks of this, Tom confidently passes his exams at better than his target 75%.

As can be seen in both these examples, there is great utility in being able to offer this service on Web sites, using data and integrating with other web services. The medical faculty at which our student Tom has been studying had previously decided (wisely) to write their quiz software as a web application. This allowed students to use the application from almost any location at any time, while also ensuring an up-to-date set of questions, and allowing the collection of a large body of data with which to build user models. Similarly, Jane was already a user of the movie recommendation web service. It was natural for them, when they added the model inspection tool, to use one that was able to operate as part of the web site.

The tools described in this thesis were designed to operate within the Web. This im- poses constraints on the solutions offered, but it was felt that it was important to explore these constraints, and the links between emerging web tools and standards and the research in scrutable user models.

1.1 Scrutable User Models

There are several similar definitions of user model, as well as a bevy of related terms. These range from the near synonyms student model and learner model, to related terms including cognitive models, system models, task models and others. In this thesis I follow the emerg- ing use of terms for user-adapted systems coming from Wahlster and Kobsa [WK86], and Kay [Kay99], and for human-computer interaction (HCI) from Norman [Nor83]: user model the system’s set of beliefs about the user; model component one of the beliefs of the model. user modelling tools tools to create and maintain user models; 6 Introduction

user model consumer programs that make use of the user model. This could also be known as a user-adapted or user-aware application;

user modelling shell a reusable user model representation and supporting tools, so that the model may be used by model consumers with different needs.

The VlUM tool described in this thesis might be classified as a ‘user model consumer’ in that it consumes the model in order to display it. It requires ‘user modelling tools’ to build and maintain the model for its consumption. Parts of the VlUM implementation are also an attempt to standardise the user model format itself, and so these parts might be seen as a component in a ‘user modelling shell’. Within a user modelling shell that is scrutable, VlUM could be seen as an important component, since it supports and enhances the scrutability.

The term ‘user model’ should be further defined. A user model is a representation of a set of beliefs about the user, particularly their knowledge in various areas, and their goals and preferences. Systems may then use this model to help a user in various ways. Jameson’s User Modelling Conference’ Reader’s guide [JPT97] gives a number of these uses, including:

• Helping the user find information

• Tailor information presentation to the user

• Adapt an interface to the user

• Choose suitable instructional exercises or interventions

• Give the user feedback about their knowledge

• Support collaboration

• Predict the user’s future behaviour

Kay [Kay99] simplifies this, arguing that by allowing the machine to ‘know’ the user, the machine can improve its activities in three ways:

1. It can improve communication bandwidth into the machine by better interpreting user actions and requests.

2. It can drive customisation of the activities within the machine, so the machine can tailor operations to the individual user.

3. It can customise the presentation of the information coming out of the machine to better inform the individual user.

Thus user models are central to the design of User Adaptive Systems. Any system that adapts itself to a particular user must contain some model of the user, whether implicit 1.1 Scrutable User Models 7 or explicit. Often these models are implicit, coded tightly into the system. One of the benefits of user modelling tools is that they encourage system designers to be explicit when modelling the user, and then enable the designers to leverage the tool to provide more advanced levels of user adaptivity. In the case of VlUM, an explicit model combined with a model visualisation tool allows the system designer to show the user the beliefs that underly the behaviour of the system.

The work in this thesis was motivated in part by the goal of creating a tool to support the scrutability of the user model. However, by allowing ad-hoc exploration of the model, it makes the model itself a more useful tool. The user model can be re-purposed, and become an interesting source of data for the user independently of any user adaptive system that might also use the model.

In the examples given previously, the user was able to achieve useful outcomes by exploring the model with the aid of VlUM. In the first scenario, Jane could find movies that are related to a movie she is known to like by exploring the model with VlUM. In the second example, Tom is able to effectively plan his study with the aid of VlUM. He used the tool to find topics in which he was performing less well than desired, and also to plan strategies for learning difficult topics. VlUM also helped him monitor his progress as exams approached.

1.1.1 User Model Servers

There is great power in sharing user models between applications. For instance, an au- tomated coach for the text editor SAM might teach about regular expressions by taking account of the user’s knowledge. This might be enhanced if it could make use of a model of the user’s previous use of regular expressions. This model might have been built by another program that tracked the user’s knowledge of the EMACS text editor. Equally, it might make good use of a user model built by watching how the user finds patterns in text files using the UNIX GREP command, which can also use regular expressions.

Similarly, large web sites that offer a multitude of services, like YAHOO.COM†, are often built on a number of disparate systems, all with a similar web page design. Sharing a user model between these servers allows the web company to offer personalisation across all their systems.

In the extreme, a user model may reside on a single ‘user model service’, and be used by applications on the Internet. For instance, the fictional service MYUM.NET could hold Jane’s user model. YAHOO and AMAZON are both told that the user model resides on MYUM.NET, and use that service as the basis of their personalisation. An agreed ontology for describing common model components allows them to share their understanding of the user, and the user’s interaction with both sites becomes simpler and more pleasant.

This concept of user model servers is starting to gain traction. Fink [FK00], in his review of four commercial user model (or personalisation) systems, noted that all were

†http://www.yahoo.com/ 8 Introduction

servers, rather than embedded systems. He noted a number of reasons for this. In addition to those discussed above, he listed

Increased security A single, well managed repository of personal data is likely to be more secure than a multitude of scattered, inexpertly managed fragments.

Multi-computer usage Applications that reside on the client itself can share a user model, even if they are on different clients. For instance, PCs at work and home can use the same model.

Smart appliances can gather and contribute a large amount of low grade data, that can be mined for higher level beliefs [Orw96].

1.1.2 Differing Interpretations of Data

With a number of user model consumers using a shared user model server, there is a ques- tion of interpretation of the data. Different consumers may have different interpretation of the base data. In work on coaching users of the SAM editor based on their user model, different coaches had different interpretations of the evidence. One coach judged users to know about a function only after they had been logged as using it [PG00]. Another treated the user’s own assessment as the most reliable evidence [Kay99]. In this work, these dif- fering interpretations became an asset.

Many models use a numeric value as the certainty of a belief. One application may interpret any value of 0.8 or more as true, while another may have a lower threshold of 0.7. This ambiguity is natural, but must be explicitly handled by the consumer, and indeed the gestalt system. All consumers must be written with the understanding that other consumers may give different opinions. This can create problems for a visualisation, as there is no single notion of ‘truth’ to display. A user might be able to choose which interpretation they want a consumer to use, and any visualisation of the model must be able to still give an overview of the model using that interpretation.

1.2 Summary and Thesis Overview

The major contributions of this thesis are:

• The creation of a visualisation for the purpose of viewing user models with over 500 concepts. The visualisation provides:

Ð an overview of the values of all beliefs of the user model Ð flexibility in the interpretation of the value of each belief Ð navigation between related components in the model. Ð navigation to arbitrary components in the model. 1.2 Summary and Thesis Overview 9

• Evaluation of the effectiveness of the visualisation both overall and as a function of the model size, for

Ð navigation between related components Ð navigation between nearly related components Ð finding unrelated components by name Ð finding components of a given belief strength Ð finding components of a given belief certainty Ð finding the component with minimum or maximum belief strength Ð finding the component with minimum or maximum belief certainty Ð combinations of the above.

• Implementation in an efficient manner that can be used practically in the world wide web.

• Implementation of a flexible user model encoding that allows for the use of different ontologies. Implementation in the Internet standard Resource Description Format (RDF) makes this suited to being shared among tools on the world wide web.

The visualisation displays a network, or graph, and allows the user to overview, find relationships and filter the data in the user model. The graph can handle any set components where the objects have data on a ‘score’, or ‘belief value’, and a measure of the certainty of the data.

The VlUM visualisation

• shows focus+context. VlUM shows the whole graph at once, with focus on the se- lected component and related components. A student, for example, can get detail on a particular component, but can get an overview of the entire course at a glance.

• is compact. It can display a graph of more components than there are vertical pixels.

• shows the belief value for each component, a measure of the certainty of the belief, and the structure of the graph around the focused component.

• uses the ‘Dynamic Query’ technique to show the belief value in relation to an ad- justable ‘standard’. It allows the user to decide what standard they expect, or want, and the display adjusts accordingly.

• can show comparisons between data sets.

The graph and data are encoded as a serialised RDF graph, enabling the display of any user model that can be expressed in these dimensions.

Two domains have been modelled. The first took up to 700 movies from the Inter- net Movie Database (IMDB)‡, and modelled a hypothetical users’ viewing preferences for ‡http://www.imdb.com/ 10 Introduction

those movies. The second domain was based on two years of data from an online assess- ment system used by a university medical faculty. A model for an average second year student was taken from the data. It contained some 540 components. Models of the second year cohort average, and a hypothetical ‘good’ student were also generated from this data.

In both these domains, a number of ways of relating the components into a graph were trialed. During this process a number of deficiencies in the authoring of the components themselves were identified and overcome. Chapter 2

Background

Because this thesis concerns the visualisation of large user models within the World Wide Web, the work draws on a number of disciplines. I will start by reviewing the work on the visualisation of user models, and establish some requirements for visualising user models. Although a review of the field of information visualisation is beyond the scope of this thesis, I shall review the visualisations and techniques that have influenced this work. Finally, I shall review the architecture of the World Wide Web, and discuss the constraints it imposes, and advantages it offers software systems.

2.1 User Models

As stated in Section 1.1 on page 5, a user model is a representation of a set of beliefs about the user, particularly their knowledge in various areas, and their goals and preferences.

User models often exist within a system, but not as a distinct logical entity. For instance, a program may ask a user’s age, and at another stage decide that users over 50 years of age will be shown a particular string while others will not. This would be an implicit model stating that This system believes that users over 50 would like to know more about X.How- ever, the model is not visible for inspection at any level, save an exhaustive examination of the application code.

An alternative is to design the software with an explicit user model. This model may be either cognitive or pragmatic [Kay99]. Cognitive user models are an attempt to model the way people actually think and know, and are of much interest to psychologists and ed- ucators. For instance, the ACT-R theory of skill acquisition [And93] can be used to define a user model for a teaching system [CA95]. Such a model is still an artificial construct of the psychologist-programmer. In many cases a more pragmatic approach is as effective, although they lay no claim to cognitive validity. The user models in this thesis are of the latter kind, although the work presented here has a broad applicability, and could well have been used in conjunction with a cognitive user model.

11 12 Background

Explicit user models have a number of important advantages.

• They force a system designer to model the end user. This avoids the common prob- lem in interface development of assuming that the programmer thinks like the end user.

• They allow the leverage of user modelling shells and tools to help create, manage and use the model.

• They can be shared between user-adapted systems, thus freeing the user from re- teaching a new software system about themselves.

• They can be scrutable [Kay99]

For these reasons, a number of user modelling shells have been constructed. Such shells have a number of essential properties [Kay99].

• A representation for the user model components. These might include preferences, knowledge, goals and attributes, like age or sex.

• Support for the use of evidence about the user from sources external to the user modelling system.

• Support for inference within a model, so that beliefs can be used to infer additional beliefs.

• Mechanisms for managing inconsistency, noise, changes in the user model, and un- certainty.

• A level of abstraction that allows broad applicability, so the model can be reused by various user-adapted systems.

The work in this thesis operates as a consumer of a user model that may be presented by a user modelling shell. However, it too could be included in a shell that also offers

• Support for access and control (scrutability) of the model by the user.

In this respect, the work presented here can be considered part of a user modelling shell.

2.1.1 Scrutable User Models

In the scenarios given in Chapter 1, we see users scrutinising their user models. In the first example, Jane wants to know

• Why does it think I like The Matrix?

• Why does it believe I would like Circle of Friends? 2.1 User Models 13

• What is related to a movie I like?

• What else does it think I would like?

She finds the answers to these questions by using a tool that allows her to explore her user model. Similarly, in the second example our student Tom can find

• Topics which the model believes he will do better (or worse) than 75%.

• Topics in which he is doing very badly.

• How topics relate to each other.

• How he is faring in comparison to his cohort.

• How he is doing compared to the expectations of the faculty.

These latter questions relate to gaining an overview of the user model. In contrast, the work by Kay [Kay99] was more focused on exposing the fine details of the model. Kay’s work was more targeted at finding out ‘Why does it think that...while this work is capable of showing ‘Where am I failing?’, or ‘What do you believe I would like?’, and referring other questions to another part of the associated user modelling shell.

There are a number of motivations for scrutability in a user model

Access to and control over personal information. As raised in the introduction, access to and control over personal information is seen as a moral, and is becoming a legal issue. This personal information, which may have been collected by observation of or direct inquiry to the user, constitutes a database of personal information [Kob90, KKM93] which should be accessible.

Programmer accountability If a user model is implicitly woven into an application, then any personal information that model may contain is not accessible. This removes the system builders’ accountability for the contents of the model. Moreover, by allowing the user to see assumptions made by the programmer about their beliefs and actions, the programmer is encouraged to be more careful about those assumptions.

Correctness and validation of the model A system can never know everything about a user, or keep completely up to date. Allowing a user control over their personal data also allows them to validate and correct the model.

Machine predictability The more two people know about each other, the smoother they can potentially interact. Similarly, allowing a user to see a system’s beliefs can in- crease understanding of the system, and enhance the quality of the interaction [HKW+96].

Aid to reflective learning An important part of learning is meta-cognition, or thinking and learning about learning [Law84, Yus85]. Several researchers have argued that mak- ing a learner model available to the learner will aid meta-cognition [KC93, KG93, PSH95, Bul97]. 14 Background

2.1.2 Visualisation of User Models

There is no particular user interface defined for scrutable user models. However, the fol- lowing requirements are likely:

Overview: The user will want to see an overview of the model, with the display reflecting the structure of the model.

Relevance: The tool will help the user identify and examine relevant parts of the model.

Navigation: The user will need to display more detail in some parts of the model, and less in others.

Component types differentiated: Where a model is showing different component types (knowledge, beliefs, etc), the type should be identified.

Component Values: The tool should show component values.

Component Value Confidence: The tool should show how confident the model is about the value for the component.

QV [Kay99] is an overview interface for the UM user modelling toolkit. An example of a QV window is shown in Figure 2.1 on the next page. In this example, QV is showing a user model for a user of the SAM text editor. The model is structured as a hierarchy, firstly into general areas such as editors and programming, then editors contains model segments for SAM, VI and EMACS. Most of these branches have been closed, so as not to discourage the novice user. The SAM structure has been opened to reveal further depth, namely useful, more useful, very useful, powerful and mostly useless commands and concepts.

The components themselves are labelled with their name, and a shape shows their type. A square indicates a knowledge component, diamond a belief, a circle indicates a partial model (or non-leaf node) and crosses indicate other component types. The filling of the shape is used to indicate the component value. For knowledge components, filled shapes are true, while empty shapes are false. Belief components are the opposite. For instance, a belief might be Evidence suggests that the user does not know how to use the scrollbar. Therefore, belief components are filled black for false, and white for true, the opposite of knowledge components. This allows the display to show all filled components for a knowledgeable user, and all unfilled for a complete novice. Nested shapes are used to show that the truth of the component could not be determined.

When QV starts, all partial models (sub trees) with any true components are tentatively marked as visible and displayed. If there are too few components displayed, further partial models are displayed, favouring the simplest aspects of the model, until a quota is filled. Once the display has been drawn, clicking on node with a closed sub-tree will cause the sub-tree to be displayed. Conversely, clicking on an expanded node will collapse it. A fully expanded user model for the SAM editor would be shown in QV as in Figure 2.2 on page 16. 2.1 User Models 15

Figure 2.1: The QV tool showing a user model for a user of the SAM text editor. Image courtesy of J. Kay 16 Background

Figure 2.2: The QV tool showing a user model for a user of the SAM text editor with all branches expanded. Image courtesy of J. Kay 2.1 User Models 17

Figure 2.3: An example of a VISNET visualisation of a Bayesian Belief Network.Figure from [ZRNG99].

Explanation Tools

QV, and the work in this thesis, present an overview of a user model. Most work in visual- ising user models involves showing or explaining details about individual components.

XUM extends QV by adding a menu to the displayed components that allows the user to: justify the value of the component; alter the value of the component, by setting the value to true, false, or maybe; explain the meaning and purpose of this part of the model.

These menu options are entry points to a GUI for examining a UM model. One interface window can show the evidence for a component, and allow further inspection, or editing, of the evidence. Another window can show explanations by displaying the documentation text for an evidence source, or an inference rule.

Another system, VISNET [ZRNG99, GZROSC99, ZRG00], helps a user understand the workings of Bayesian Belief Networks (BBNs). BBNs offer a relatively intuitive approach to visualisation, where causes and effects are represented by circles and arrows, which are 18 Background

Figure 2.4: The ‘viewer control panel’ from a viewer in TAGUS. Image from [PS95].

directed from each cause to its effects. Although simple static directed graphs can con- vey much information, they can be overwhelming. VISNET uses temporal order, colour, size, proximity and animation to help people understand BBN concepts such as marginal probability, changes in probability, probability propagation and cause-effect relationships. Several methods of visualising the BBN were trialed in [ZRNG99]. In Figure 2.3 on the page before we see a cancer-coma network in which both size and hue are used to show the ‘belief value’. As one belief changes value (for instance, if a brain tumour is discov- ered), then VISNET animates the flow of this change of probability through the rest of the network. VISNET is effective for explaining the behaviour of small BBNs to end users. However, it is not designed for offering an overview of a model. Nor is it suitable for dis- playing large models. As an explanation tool, it makes no attempt to allow user control over the model.

Pavia & Self [PS95] allow the system builder to edit user models with graphical tools in TAGUS. The tool, shown in Figure 2.4, is not targeted at end users. It concentrates on allowing the system builder access to the details of the model, and offers no overview capability.

Like the editors in TAGUS, graphical tools to access user models in BGP-MS [KP95] are intended to help the system builder define and manipulate models. The ‘partition editor’ shown in Figure 2.5 on the next page can be used by the programmer to define aspects of the model, but is not available to, and would probably not be useful for an end user. 2.2 Information Visualisation 19

Figure 2.5: The ‘partition editor’ from the BGP-MS system. Image from [KP95].

2.2 Information Visualisation

Graphics is the visual means of resolving logical problems. Bertin, 1974, p 16

The power of the unaided mind is highly overrated .... Howhave we increased memory, thought, and reasoning? By the invention of external aids: It is things that make us smart. Norman, 1993, p 4

The VlUM tool described in this thesis allows a user to find relevant answers from a large body of data by manipulating a graphical display of the data. The use of graphical representations of data in problem solving has a long history. For instance, in 1613 Galileo used small multiple diagrams to show that sun spots were indeed spots on the face of the Sun [Tuf83]. More recently, it has been suggested that the Challenger shuttle disaster could have been prevented if the right display of data concerning booster o-ring damage had been used when deciding whether to proceed with the launch [Tuf97]. The computing power available to modern scientists has accelerated the use of graphics to display data. The scientific visualisation literature covers areas such as the display of airflow in weather systems [TR97, SHJ94] and medical applications that allow 3D rendering of the human body [NSP96].

It has been noted by Herman et. al. [HMM99] that Scientific Visualisation is usually closely related to mathematical structures and models. This offers an inherent geometry on which to base a visualisation. For instance, given a high end graphics workstation and some 20 Background

programmer time it is not difficult to find a good graphical representation of airflow over a wing. The very real nature of the data offers immediate suggestions. In contrast, finding a good graphical representation of the complete works of Shakespeare [Sma96], or films made in the last century [AS94], requires more work. Furthermore, most scientific appli- cations of visualisation methods can assume a motivated and expert user. This is not true of the more general ‘Information Visualisation’ techniques which are used in domains with possibly novice users. For instance, Ahlberg’s FILMFINDER was designed for possible use in video stores.

There have been a number of attempts at defining a theory of information visualisa- tion [Ber81, Tuf83, Tuf90, Tuf97]. In a more recent work, Card et.al. [CMS99] define ‘Information Visualisation’ as The use of computer supported, interactive, visual represen- tations of abstract data to amplify cognition. They go on to define a reference model for information visualisation. In this model

• raw data is transformed into ‘data tables’ consisting of structured data, relations and metadata.

• The data can then be mapped into ‘visual structures’ which consist of a spatial sub- strate, marks, and graphical properties.

• These visual structures are then transformed into ‘views’ by specifying graphical parameters such as position, scaling and clipping.

At any of these stages, users can control the visualisation by adjusting parameters, restrict- ing the view to certain data ranges, or changing the nature of the transformation. The visualisation and their controls are used in service of some task.

In their analysis, Card et.al give an inclusive list of eight examples of data tables, eleven examples of visual structure, four views, three types of human interaction and eleven tasks a user might want to accomplish with a visualisation tool. Clearly there is a multitude of possibilities. In the following sections, I shall review techniques that have informed the design of the VlUM display.

2.2.1 Visual Properties and Coding

Visual properties such as colour can encode abstract information that would otherwise be unseen. This is a simple and long recognised fact that has been used in nearly all visualisa- tions, and in graphics work before computerisation. There are several properties that can be used to encode extra information into a data point. Colour and size are useful for encoding continuous variables, while shape can encode discrete values. They may of course be used together to encode multiple attributes of the data on one point, or mark, on the screen.

An early example of a computer visualisation that encoded data in visual properties was SEESOFT [ELE92], which showed all the lines of code in a complex piece of software. Each module was shown in a vertical column. Columns too long to fit on the page were 2.2 Information Visualisation 21

Figure 2.6: A screenshot of SEESOFT visualisation. In this example, darker lines have been executed more times in an execution of the program. split and the remainder placed next to the original column, as can be seen in Figure 2.6. The columns represented listings of the program code in text too small to read (about one line per pixel), but large enough to allow colour coding based on a number of criteria. For instance, if multiple programmers worked on the code, one could colour the lines according to author. This gave an immediate overview of code authorship of the whole program. Alternatively, the colour of the line might indicate the age of the code, or even the number of times a line is executed in a typical run of the program.

2.2.2 Dynamic Queries

Sometimes it is useful to highlight a particular subset of data within a large data display. Static paper based examples might be maps which highlight particular roads to indicate a route to a set location, or application forms that highlight the more important information. These have their direct analogies in computer systems, but it is possible to do more with a computer. By allowing a user to dynamically adjust query parameters, and instantly see the results of the changed query, a dynamic query interface allows a user to explore the data in a very efficient way. 22 Background

Figure 2.7: A screenshot of a dynamic query tool for exploring the periodic table of ele- ments. Image courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/.

Figure 2.8: Dynamic query tools often use a double slider to allow the user to specify a range of values.

A dynamic periodic table of elements [AWS92] shown in Figure 2.7, displays the clas- sic table, but allows the user to dynamically highlight the elements that match a set of properties. For instance, a user can highlight all elements with an atomic mass within a certain range, and an ionization energy within another range. Other properties available to the query are atomic number, atomic radius, ionic radius and electronegativity. All these properties can be set by using slider controls at the bottom of the screen. Often a user will want to specify a range of values. For instance, search for elements between atomic num- bers 17 and 26. For this, a double slider is used. An example can be seen in Figure 2.8. In the figure, the slider starts by encompassing the entire range (in this case, 1 Ð 824). A user may adjust the two ends of the range, and on the right of the figure a user has set the slider to select values between 239 Ð 705.

The DC HOMEFINDER system, shown in Figure 2.9 on the next page, shows a map of the Washington DC area that indicates all the homes for sale in that area. There is too much information on the map for it to be useful. Instead, one uses provided controls to indicate which data to show. For instance, you can set a price range, and only houses within that range are shown. You can mark a point on the map, and show only houses within a certain distance of that point. Other variables such as number of bedrooms may also be specified. This query will reduce the number of houses shown to a manageable size. If the query does not present an appropriate house, the user can decide whether to relax some of the constraints.

Ahlberg [AS94] gives a further example. The FILMFINDER (Figure 2.10 on the facing page) shows all films in its database on a scatterplot of production date against film pop- 2.2 Information Visualisation 23

Figure 2.9: A screenshot of HOMEFINDER visualisation and dynamic query tool. Image courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/

Figure 2.10: A screenshot of FILMFINDER visualisation and dynamic query tool. Image courtesy of B. Shneiderman, http://www.cs.umd.edu/hcil/spotfire/. 24 Background

Figure 2.11: A screenshot of SPOTFIRE visualisation and dynamic query tool exploring an abstract data set of biochemical and pharmaceutical companies.

ularity. This creates a display too cluttered to find any particular film. A user may then query the database using sliders and checkboxes to set the values of attributes to be shown. Only films that match these values are shown, and so by restricting these values the dis- play becomes less cluttered until the number of movies shown becomes quite manageable. Attributes that may be adjusted are title, actor, actress, director, length and rating.

∗ The ideas explored in FILMFINDER were commercialised in the SPOTFIRE product. SPOTFIRE allows the dynamic querying of large, arbitrary data sets. For instance, in Fig- ure 2.11, the tool is being used to show the attributes (revenue, netIncome, shares, parket- cap, pricePerSales, etc.) of a large number of companies involved in biotechnology and drugs. In the figure, NetIncome is plotted on the y axis, and Revenue on the x. Colour codes the type of company, while mark size shows yet another attribute. Double Sliders are used to adjust a number of query variables, and the number of records shown is displayed in the status bar at the bottom of the display.

It should be noted that the results of a visual query need not be displayed graphically. One example [LOS92] showed a listing (ls -l) of a UNIX directory, and asked users ‘how many files are younger than umcp-tai’. The user was able to adjust sliders to select files of a particular size or age. Selected files were shown either by a different colour, by an asterisk next to the name, or by removing non-matching files from the listing. In user tests the third method (removing non-matching files) had a statistically significant speed advantage.

∗http://www.spotfire.com/ 2.2 Information Visualisation 25

2.2.3 Focus + Context and Distortion

A touring map used when driving between, say Sydney and Canberra, will often include a small scale map of the entire journey, but then provide a number of larger scale maps of denser regions like towns or cities. This is a simple static attempt to provide both a view of the entire trip, and also finer views of areas for which the user is likely to need more information. This can be called finer focus on areas of interest. In addition, some maps have a means of showing the location of these larger scaled maps on the small scaled general map, in order to give context to the focused maps. Together, these provide the foundation of focus+context visualisation techniques, where the area of interest is shown in detail, but the larger picture is also shown to provide easy change of focus and the context of the detail.

The problem with placing a small ‘fo- cus’ map on the large ‘context’ map is that part of the context is obscured. Also, it is not always clear how the focus map relates to the context. Distortion techniques solve these problems. Distortion techniques take the plane of data, and distort it to enlarge the focus, while retaining a view of the context. An example may be seen in Figure 2.12. In the figure, there are five foci, which are enlarged to varying degrees, while the surrounding context is Figure 2.12: A grid showing some nonlinear compressed but still visible. distortion. Five foci are shown. In their taxonomy of distortion-oriented presentation techniques [LA94], Leung and Apperly give a lengthy discussion of the his- tory and scope of work in this field. Some notable examples are the given here.

The first rigorous mathematical treatment of a polyfocal projection was given by Kad- mon and Shlomi [KS78]. In their work, the curvature of the magnification function is controlled by two sets of parameters; one controls the magnification at the point of focus, and the other the rate of change of magnification with distance from the point of focus.

A ‘fisheye’ display may be considered a special case of a polyfocal projection with a particularly disjoint magnification function that shows the current focus, and then only ‘landmarks’ of the context. Furnas’ [Fur81] ‘fisheye’ view of program files operated only on character displays, and so was text-only. It defined a Degree of Interest (DOI) function for determining the visibility of parts of a file. The current line was of most interest, as was anything in the immediate context. For instance, a fisheye editor might show the current block in full, but only the enclosing control structures of the surrounding block, and then only the names of other functions in the file. 26 Background

Simple magnification functions can be effective. Ben- derson’s FISHEYE MENU [Ben00] uses the simple magnifi- cation function with a minimum font size, a maximum font size, and a relatively short and linear transition between the two. This function is used to help show the focus+context of a long linear menu that would otherwise be too long to show on a single screen. The result may be seen in Fig- ure 2.13. In this figure, we can see that all items are listed at once, but only the focus region is truly visible. Between the background and focus regions is a short area of transi- tioning magnification. The focus region may be extended by using the arrows on the right of the menu. The menu is also in alphabetic order, and so a simple indexing scheme in the left margin can help speed searching. In user tests, fisheye menus were preferred over other types of menus for browsing tasks, and have been used with 266 elements in the menu.

Other techniques exist for mapping a distorted display

Figure 2.13: A FISHEYE to a pane. One is to use an alternative geometry. For in- MENU. stance, hyperbolic geometries have the convenient property that the circumference of a circle grows exponentially with its radius, which means that exponentially more space is available with increasing distance. Also, the hyperbolic plane, a mathematical abstraction, can be mapped in a natural way onto the Euclidean unit disc. This disc is suitable for display on a conventional screen surface. When mapped to the Euclidean unit disc, the ‘centre’ of the mapping appears at a normal size at the centre of the disc. However, as the edge of the disc is approached, fea- tures are exponentially reduced in size. Any point on the hyperbolic plane may be mapped at the centre of the Euclidean disc, allowing the tree (or whatever is plotted) to be explored.

Lamping [LR96] used a hyperbolic geometry to show large trees. The Lamping browser was commercialised by Inxight† and has become a general tool for viewing graphs, al- though trees are more commonly displayed. As seen in Figure 2.14 on the facing page, the current focus of the visualisation can be easily seen, while the context, nodes somewhat re- moved from the current focus, are still drawn, albeit at a diminishing scale. In Figure 2.14 on the next page, a tree of recipes is shown. The focus of the hyperbolic plane is somewhat above and to the left of the ‘all recipes’ icon. The focus may be moved by ‘dragging’ the pane with the mouse, or single-clicking the mouse on the point you wish to focus on.

Another alternative geometry is to move from a planar world to a three dimensional projection. This is the technique used in THE PERSPECTIVE WALL, shown in Figure 2.15 on the facing page. THE PERSPECTIVE WALL shows data that can be ordered in some linear way. The data is placed on a ‘ribbon’, which is ‘standing on its side’. To keep the ends of the ribbon in view, they are positioned in a 3D space to appear to be ‘folded’ †http://www.inxight.com/ 2.2 Information Visualisation 27

Figure 2.14: The Inxight STARVIEWER hyperbolic browser showing a tree of recipes.

Figure 2.15: A screenshot of THE PERSPECTIVE WALL. 28 Background

Figure 2.16: A screenshot of THE TABLE LENS analysing mutual fund performance.

away from the viewer. The perspective projection onto the viewable 2D plane then gives the focus+context effect. In Figure 2.15 files are shown arranged by creation date on the (long) x axis. Different types of files are placed on different ribbons, stacked on each other, thus making better use of the y axis of the plane. This use of the ordering and perspective properties of three dimensional drawings has also been used in displays like CONE TREES [RMC91].

THE TABLE LENS [RC94] uses an alternative encoding, rather than an alternative ge- ometry to achieve a focus+context effect. The system is for visualising and making sense of large spreadsheet style tables. It uses focus+context techniques to allow interaction with large information structures by dynamically distorting the spatial layout of the table ac- cording to the varying interest levels of its parts. Most tables tend to be too long to fit on a computer screen. THE TABLE LENS overcomes this by offering alternative, more compact representations of cell values if the cell is not in ‘focus’. For a simple number, that alter- native representation might be a horizontal bar of length proportional to the value in the cell. A column of such horizontal bars becomes a very effective visualisation, as it shows visually the relative values of the cells in the column. As can be seen in Figure 2.16, sorted columns ‘looks’ sorted, and a glance can show possible relationships with the values in other columns. Individual rows, or groups of rows, may be magnified (focused), and return to their symbolic representation.

In Figure 2.16 the performance of mutual funds is being analysed. The funds are cur- rently sorted on their 5 year performance, and two particular entries (one under-performing, and one over-performing) have been expanded. Comparison with other columns shows that the recent collapse of tech stocks has temporarily reversed the fortunes of some of the harder players over the last quarter, although the last month has seen some return to form.

It is also possible to use alternative 3D geometries. Munszner [Mun00] in the H3 system, has extended the concept of the hyperbolic browser to show a three dimensional hyperbolic space in a three dimensional sphere, which is then rendered using standard 3D graphics techniques on a display. The result can be seen in Figure 2.17 on the next page. This arrangement has been used to display a network of 10,000 nodes. In the H3 system, 2.2 Information Visualisation 29

Figure 2.17: A screenshot of a hyperbolic space graph layout. Image courtesy of T. Mun- szner.

users can explore the space by moving the sphere.

DEXTER[Mur96] was developed to visualise and navigate a graphs of related ‘story elements’. It was used in a number of projects within the MIT Media Lab, most visibly the presentation “Jerome Wiesner: A Random Walk through the Twentieth Century” [JBW99]. The main point of interest for this thesis is the vertical display of items, Materials Listing, which is the vertical list of words in the centre of Figure 2.18 on the following page. This is a list of the complete set of documents available to the viewer, connected by what is essentially a graph. Titles may be selected by mouse click. This selected title then moves to the top of the display, and becomes the root of a spanning tree of titles. Other titles become brighter and are given more space in the vertical column when they are closer to the root of the tree. The actual algorithms for layout of this list are not clearly described in the literature. The JBW presentation contains 74 elements within the materials listing and seems to be usable with this number, but there have been no recorded user evaluations of the tool. 30 Background

Figure 2.18: DEXTER after selecting few JBW’s. Notice the start of a history. Also notice some depth in the displayed tree. The title such a force is brighter than a thousand souls, but not as bright as the selected title.

2.2.4 Animation

The human brain has a well developed spatial processing ability. We can track the position of many objects at once provided the objects obey the laws of naive physics. For instance, if we play a shot in snooker, moving four balls, anyone who saw the shot will be able to point out which balls moved. People who had turned away for a moment (to visit the bar, say), will often have trouble pointing out the difference in ball positions. Animation allows our brains to track changes in spatial configuration.

Methods of tricking the human brain into thinking that a sequence of still images is in fact a moving scene have been well studied. The movements of Bugs Bunny and Wily E Coyote, although amusingly defiant of Newtonian physics, are easy for the human brain to follow. Any sequence of images shown to a human at greater than ten frames per second will simulate movement [CMN83]. More advanced techniques like motion blur [CU95] are not yet commonly used in information visualisation.

The cognitive co-processor module in the INFORMATION WORKSPACE [RCM93] at- tempts to ensure that these human-centred timings are observed by the INFORMATION WORKSPACE. It ensures that animations progress at least ten frames per second, but take about one second to complete, regardless of system load.

Taking a lead from this work, almost all visualisations that allow for interaction use some degree of animation to help the user follow points of interest. The hyperbolic browsers reviewed here, THE PERSPECTIVE WALL,DEXTER and THE TABLE LENS all use anima- tion to some extent. 2.3 The World Wide Web 31

2.3 The World Wide Web

The tools described later in this thesis are designed to operate within the World Wide Web. In particular, VlUM can be used within a web browser, and stores the user model in an Internet standard format on a web site. This gives them flexibility and utility, but does impose some rigid design constraints. Here I describe enough about the World Wide Web for completeness. This section describes the underpinnings of HTTP, URLs and XML. Section 2.4.3 on page 38 is a description of the Resource Description format in which the VlUM user model is stored.

The World Wide Web [BLCL+94] is a distributed hypertext system. Hypertext, or Hypermedia as it is sometimes known, is a combination of pages, or components, and links between the components [AMY88, Con87, Goo87, HMT87]. The essential features of a hypertext system have been captured in the Dexter Hypertext Reference Model [HS94]. Without delving too deeply into what is now common knowledge, hypertext consists of (in the terms of the Dexter model) Components that contain text, graphics, or other media that form the basic content of the hypertext network. Links between components are also viewed as components in the model, and can be specified as one way, two way, to one or more components, to a whole component or a subrange within the component, and with presentation specifications. Presentation specifications take care of cases where different links to the same component may trigger different properties. For instance, a normal link to a component may just view the component, while a link from a special “teacher’s page” may allow editing the component.

The Dexter model requires that a hypertext system enforces a number of constraints, one of which is that the hypertext must be link-consistent: that is, all links must be con- nected to existent components. If a component is deleted, then all links involving that component must also be deleted to maintain link-consistency. The World Wide Web has achieved its scalability by dropping this requirement. Instead, the user agent (browser) must report errors in linking to the users. This strategy has clearly been successful. The number of component nodes in the Web has grown exponentially since 1990, and a top web index, GOOGLE.COM‡, had recorded 1,326,920,000 distinct pages by 4/12/2000. This is certainly a low estimate, and excludes many adaptive web pages and pages excluded from the index by web site administrators.

The Web requires all clients and servers to be inter-operable. This has been achieved by the public specification of three core components: Data, Transfer, and Names.

2.3.1 Data

The basic data type of the Web is now The Extensible Markup Language [XML99a] (XML), a simplification and reformulation of the Standard Generalised Markup Language (SGML) [ISO 8879]. XML is a character stream, encoded in unicode [ISO 10646], that

‡http://www.google.com/ 32 Background

contains nested character strings, delimited by tags. The tags (in the form of ...) give meaning to the enclosed text. In practise, this allows a document to contain a hierar- chical structure that may be validated against a Document Type Definition (DTD) or XML Schema [XML01]. For instance, we could define an XML document type that encoded information about a person: Uther James +358 40 704 3033 http://people.gmp.usyd.edu.au/hemul/ This is a well formed XML document, meaning that all opened tags are closed, and none overlap. If we had a schema or DTD for this document, we could validate it against that, and if it was correct, the document would be well formed and valid. Any XML parser can parse a well formed XML document, and optionally check the validity of the values and structure against the DTD. However, the semantics of the tags and their values must be known to the consuming application if it is to do anything clever with them. For instance, the simple XML example above is one way of describing a person, but to a given web browser it will just be an abstract tree of XML. However, to an appropriately programmed address book application it would describe useful attributes of a ‘person’ entry in the address book.

Fortunately there are many openly specified tag sets, including sets for marking up sim- ple human readable documents (XHTML) [XHT00], mathematical formulæ(MathML) [Mat01], vector graphics (SVG) [SVG00], metadata (RDF) [LS99], time based multimedia (SMIL) [SMI98], stylesheets (XSL) [XSL00], and many others.

XML Namespaces

All these types of markup mean that there are many XML tags defined. To prevent the frac- turing of web data into untold incompatible and overlapping sets of tags, XML documents can use XML Namespaces [XML99b] to allow tags to be taken from common tag sets, and yet not conflict with similar names in other tag sets. An XML Namespace is a URI, and is combined with tags from that namespace into a tuple. For instance,

some text

2.3 The World Wide Web 33 shows an element, that is disambiguated from any other usage of that tag name by marking it as coming from the namespace http://www.w3.org/2000/svg. This means that tags in that element are known internally to the parser as a tuple

< http://www.w3.org/2000/svg , svg >

Similarly a second namespace is defined for the ‘xhtml’ prefix, http://www.w3.org/ 1999/xhtml. Thus the tag is known internally as

< http://www.w3.org/1999/xhtml,p>

Although the namespace is a URI, according to the namespace specification it is not guar- anteed to point to a web resource.

The Hypertext Markup Language

The Hypertext Markup Language [HTM99] (HTML) is an XML document type¤ for encod- ing human-readable documents. It operates at about the same semantic level as LATEX[Lam94], allowing the author to specify section headings in six levels, emphasise text, show quotes and generally define which parts of the document should be shown in special ways. It is up to the browser to render the document as it sees fit, based on these hints. This may involve display on a small or large screen, or even reading the document to a user in a car, or to a person with impaired sight. HTML documents may ask the browser to include some other web references to other documents on the Web. These might be requests for the browser to include other web data in the current page (adding a picture for example), or to make a section of the document a link to another Web document. In general, clicking on such a link will cause the browser to fetch and display the new document, possibly handling or reporting any errors encountered in the process. A simple HTML page would be

My Home Page

James’ Home Page

This is the home page of James Uther

It has a title, a heading in the body, and two paragraphs.

which is displayed in a Netscape browser as in Figure 2.19 on the following page. ¤HTML was originally an SGML document type. XML was used as of HTML version 5 (XHTML). Constructs legal in SGML but not in XML, such as implicit tag closure, were deprecated. 34 Background

Figure 2.19: Some HTML rendered by a Netscape browser.

2.3.2 Transfer

The Web uses a single protocol for communication between clients and servers. The Hy- pertext Transfer Protocol (HTTP) is optimised for quick servicing of requests for single hypertext nodes (web pages). In its simplest form it operates as follows:

1. Client sends a request with optional RFC822 encoded parameters to the server GET /documents/page.html HTTP:1/1 Host: pgrad.cs.usyd.edu.au 2. Server responds with a response, and RFC822 encoded response headers and mes- sage body HTTP/1.1 200 OK Content-Type: text/html Content-Length: 234

...

This simple base allows implementation of web servers and clients in almost any sized device. However, the protocol as specified allows more complexity. A pertinent example is authentication.

Many pages on the Web are intended to be read by a limited group of people. A web client must be able to prove who it is to a server so that the server can decide whether the client is allowed to access the requested page. The problem of authentication over networks is well studied and many solutions exist [NSS88, C.C88]. HTTP allows clients and servers to implement any solution, however only one, known as ‘Basic Authentication’ is widely implemented. The general mechanism is as follows

1. A client requests a resource using the normal mechanisms.

GET /documents/page.html HTTP/1.1 Referer: http://www.gmp.usyd.edu.au/index.html 2.3 The World Wide Web 35

If-Modified-Since: Sun, 28 Apr 2001 13:12:40 GMT Host: pgrad.cs.usyd.edu.au

2. The server replies that access is denied unless a valid authentication token for the given ‘authentication realm’ is presented

HTTP/1.1 401 Unauthorized Date: Sun, 29 Apr 2001 13:12:30 GMT WWW-Authenticate: Basic realm="GMP"

3. The client responds with the same request as before, but adds the correct authen- tication token for the realm. In the ‘Basic’ scheme this is a base64 encoded user- name:password pair

GET /documents/page.html HTTP/1.1 Referer: http://www.gmp.usyd.edu.au/index.html If-Modified-Since: Sun, 28 Apr 2001 13:12:40 GMT Host: pgrad.cs.usyd.edu.au Authorization: Basic T7sK2X8pkQ7M7pdk2hcpSk==

4. The server checks the token, and if correct returns the page. If the token is invalid, the access denied response is given again.

5. The client then usually remembers that this token is required for that realm, and sends the token automatically for the rest of the browsing session.

The ‘Basic’ scheme is weak, in that it passes the secret password in almost cleartext over the network where it can be easily seen by anyone eavesdropping, and transactions requiring real security usually rely on a lower level scheme such as IPSec [RFC2401] or Transport Layer Security (TLS) [RFC2246].

It should be noted that HTTP contains other operations (PUT, DELETE), options for finer cache control, content negotiation, and newer work extends it into the distributed authoring space by adding locking and versioning capabilities (WEBDAV, [RFC2518]).

2.3.3 Names

All resources on the Web are named by a Uniform Resource Identifier (URI). A URI is unique within the Internet, but does not necessarily give a location for the resource. A common sub-species of URI, the Uniform Resource Locator (URL) does give a location for a stream of bits. A URL is a string, encoded in ASCII with escape sequences for other characters, with three sections

:///

A simple example would be a document on the host www.gmp.usyd.edu.au, accessed by HTTP, called /page.html. 36 Background

http://www.gmp.usyd.edu.au/page.html

The identifier within the host (/page.html in this example) may be any path under- stood by the host expressed in the character set available to URLs. A more complex, but still quite valid example would be /Department%20Documents/Research% 20Report.html, a document within a directory, with the characters %20 exchanged for the space character as required by the URL specification.

A URI may be used in hypertext documents to identify target documents in hypertext links. We could refer to the page specified above as an HTML link with the following text

Here we have a link to another page

This HTML, displayed in a browser, would interpret a click on the text ‘a link’ to mean the user wants to follow the link to the specified page.

2.3.4 Summary

These three components specify the operation of the World Wide Web. No matter how a server finds stock quotes, or accesses wire services, it must present the results over HTTP in XML (or take chances with a widely used data type like PNG). Any resources it refers to must be named by a URI, and if the URI is a URL, then the resource may be retrieved.

2.4 Knowledge Representation on the Web

For the tools in this thesis to operate seamlessly with other web clients, the user model had to be represented in a Web standard for knowledge representation. This standard is the Resource Description Format (RDF). This section reviews RDF, starting with its roots in the description of metadata, which is a simple sub-field of knowledge representation.

2.4.1 Metadata

Metadata is “data about data”. Common examples are library catalogues or television guides, which contain data about books available at the library, or programs due to show on television during the week. It can be seen as a summary of the salient features of an object, and so is often used to help manage collections of data. In the context of a library it is easy to see how a database of the title, author and location of each book can aid in managing 2.4 Knowledge Representation on the Web 37 and using the library. Indeed many of the standards for metadata are driven by the needs of libraries.

One of the problems with many metadata schemes is that they can only encode with a fixed ontology, with no scalable mechanism for recognising other ontologies. For instance, Alice’s documents use the ‘Author’ attribute, Ben’s use the ‘Creator’ attribute. A third search engine knows of neither of these and expects the author to be called the ‘Writer’. This problem is multiplied by the size of the Internet. The solution has taken two steps

1. Develop an extensible language for specifying metadata that allows mixing of on- tologies.

2. Define and popularise standard ontologies for common tasks.

The first major effort at this was the Warwick Framework [Lag96], which proposed an encapsulation scheme. A document could be categorised with metadata according to a number of ontologies, each set forming a package. Packages could be collected, possi- bly recursively, into containers. This allowed a document to be categorised according to, for instance, the Dublin Core set, the Anglo-American cataloguing rules (AARC2), and a metadata consumer could use whichever set it understood.

While the Warwick Framework went some way to solving the problem of mixing on- tologies, it required the document to be categorised in multiple systems, with redundant metadata. For instance, it was possible that the document could have three ‘Author’ at- tributes from three different metadata standards. Even more problematic was that the three definitions of ‘Author’ could all be subtly different.

This problem, and others, is solved in the Resource Description Format (RDF) from the World-Wide Web Consortium, described in Section 2.4.3 on the next page. It allows resources to be described with attributes from any ontology, or even from multiple ontolo- gies. The use of the XML namespace mechanism ensures that all attributes are unique, and that a description of the precise meaning of the attribute can be found. Resources and metadata terms are all labelled with URLs, and so it is even possible to state, in RDF, that two metadata terms from different standards are synonyms. RDF is currently being used as a platform for indexing information, privacy information, intellectual property rights labelling, and PICS (Platform for Internet Content Selection) [PIC99] ratings information.

2.4.2 Ontologies

Once we have a standard set of attributes with which to label an object, it is useful to have standard ontologies for some of the categories. Good examples of this may be found in the field of medicine. The medical profession has a strong need to keep their terms in order. A researcher writing a paper about a particular disease must publish the paper with the correct and accepted term for that disease in the metadata if it is to be found by doctors treating people with the disease. The Medical Subject Headings, or MeSH [mes01] is such a set of standard terms. 38 Background

Another commonly used ontology is the International Classification of Diseases [icd92]. This World Health Organisation sponsored list of diseases allows for a standard when writ- ing patient records and articles, and generally in communications within the medical com- munity.

2.4.3 Resource Description Format

Possibly the most common assertion for metadata is something of the form “The author of http://www.site.org/page.html is James Uther”. However, this is only this simplest example. This assertion is itself data, and can have a URI. Given that the Web supports and encourages links between resources, it makes sense to be able to refer to other statements. Therefore one might also want to express that

This thesis states that (“The author of http://www.site.org/page. html is James Uther”)

Similarly, the entity “James Uther” may have some further metadata associated with it, such as email addresses or telephone numbers.

The Resource Description Format (RDF) is a method for describing arbitrary graphs of metadata about any resource that can have a URI. The format is general enough to be used for simple knowledge representation work. It has been recommended by the World Wide Web Consortium (W3C) as the metadata standard for the Web. It is a standard way to represent graphs of metadata, and a formalism for unambiguously serialising that graph as an XML document. The graph, or data model represents assertions about resources within a Directed Labelled Graph (DLG). Two RDF assertions are the same only if this data model is the same. The rules for encoding the model in XML allow the same graph to be encoded as different XML documents, but as long as the un-serialised graph is the same, the assertions are equivalent.

The basic RDF data model consists of three types of objects:

Resource A resource is a thing being described by an RDF expression. It is anything that can be labelled by a URI.

Property A property is a URI representing a specific attribute, characteristic or relation that is being used to describe a resource. Its specific meaning, possible values and relationship to other properties are not prescribed by RDF itself, but may be encoded within an RDF schema definition [BG99].

Statement A statement consists of a resource, a property, and a value for that property. Together these form a statement of the form

< resource > HAS< property >< value >

where the value can either be another resource, or any legal XML value. This collec- tion of < property, resource,value > is known as a triple. 2.4 Knowledge Representation on the Web 39

Author http://www.site.org/page.html James Uther

Figure 2.20: The statement “The author of http://www.site.org/page.html is James Uther” as an RDF graph.

James Uther

NS:Name

DC:Creator http://www.site.org/page.html

NS:Email

[email protected]

Figure 2.21: The statement “http://www.site.org/page.html was authored by someone who is called James Uther and who has an email address [email protected]. edu.au”

Thus the above assertion (“The author of http://www.site.org/page.html is James Uther”) is shown in RDF as the statement in Figure 2.20. This shows “James Uther” as a simple string literal, which is not really that useful to a machine. “James Uther” is a person with details that the application consuming this metadata might find useful. There- fore we might actually use Figure 2.21 which may be transliterated as

http://www.site.org/page.html was authored by someone who is called James Uther who has an email address [email protected]

There are some other interesting points in Figure 2.21. The resource that has references to “James Uther” and his email address has no URI. This is called an anonymous resource and is perfectly legal in RDF for just this purpose. A resource may also be given an ID, that allows references to such an anonymous resource from other statements, without having it actually referring to any external object. The properties of this resource are “NS:Name” and “NS:Email”. The “NS:” here is a standard XML namespace prefix that allows these properties to have names unique to the entire Web. For instance, “NS:” could have been set to equal “http://www.gmp.usyd.edu.au/schema/people/”, in which case “NS:Name” would expand within the RDF processor to “http://www.gmp.usyd. edu.au/schema/people/Name”. This allows mixing of schemas within a single RDF graph without collision. In this case we have the “NS:” namespace, and the “DC:” namespace that referrs to the standard Dublin Core [dub99] attribute set. The text referred to by the property URL is often a description of the semantics of that property, but could also be a formal description of the property using an RDF schema [BG99] that outlines possible values of the property and relations to other properties. 40 Background

DC:Creator http://www.site.org/page.html ldap://host/uid=hemul

ldap:cn ldap:mail James Uther rdf:type rdf:Alt

rdf:_3 [email protected] rdf:_1 rdf:_2

[email protected] mailto:[email protected]

Figure 2.22: The statement “http://www.site.org/page.html was authored by someone who’s particulars can be found at ldap://ldap/uid=hemul, who’s common name (cn) is James Uther and who has three email addresses as shown”.

The RDF specification uses these basic elements to build some more complex struc- tures. There are three container types:

Bag An unordered collection of resources or literals. Bags are used to represent statements like “The students in the course are Petri, Risto, Katri, and Titta.”. Duplicate values are permitted.

Sequence An ordered collection of resources or literals. Use for statements like “The days of the week are Monday, Tuesday, Wednesday, ...”. Duplicates are permitted.

Alternative A collection of resources or literals that represent alternatives for the (single) value of the property. This may represent statements like “The document may be downloaded from ftp.cs.usyd.edu.au, mirror.aarnet.edu.au or ftp.vendor.com”.

An even more realistic version of the example given above might be if the “DC:Creator” attribute referred to a resource that was a node on an X500 tree accessed via LDAP [HS97], as shown in Figure 2.22. In this case we might show the LDAP attributes for the person, which may include a list of alternative email addresses. The “NS:” namespace has been replaced with an “ldap:” namespace, which would encompass the possible ldap attributes. This example also uses the alternative container type. As can be seen in the figure, the container is an anonymous resource, with an “rdf:type” of “rdf:Alt”. By convention the contents of the container use the attribute “rdf: x” where x is a unique number within that container. By using “rdf:Alt” rather than another container type we are asserting that any, but only one, of these email addresses is to be used at a time. The RDF specification contains examples of more complex statements, such as state- ments about statements (metametadata) and other constructs that, while interesting, are not used in this work. 2.4 Knowledge Representation on the Web 41

Serialisation of RDF

To transfer RDF graphs, a serialisation must be used. The graph must somehow be boiled down into a stream of bits, or at least UTF-16 encoded characters. The two most widely used serialisations of RDF are an XML representation and a triple representation.

RDF can be formally viewed as a collection of triples, each expressing one RDF state- ment. A triple may be of the form

{predicate, subject, object}

Square brackets can denote a resource and quotation marks denote a literal. In this case the graph shown in Figure 2.20 on page 39 would serialise as

{Author, [http://www.site.org/page.html], “James Uther”}

There is a formal mapping between the RDF graph and the representory triples. These triples can be consumed quite naturally by logic programming environments.

RDF graphs are more usually serialised as XML. The graph in Figure 2.20 on page 39 can simply be represented as

James Uther or in the RDF abbreviated syntax

The more complex example in Figure 2.21 on page 39 might be

42 Background

while the example in Figure 2.22 on page 40 would be

James Uther [email protected] [email protected]

The serialisation of RDF as an XML document allows for relatively simple parser im- plementations. At the time of writing the RDF web page¦ lists some eleven freely available RDF parsers implemented in all popular programming languages. At least two frameworks that include RDF parsing, storage and search are listed on the same page.

RDF in User Modelling

A user model is a representation of knowledge about a user. Thus it seems logical, when designing a user model format for use on the Internet, to use an Internet standard for rep- resenting knowledge. RDF serves this purpose. Although RDF was originally developed to suit the needs of the metadata community, it retains roots in the knowledge representa- tion field. The expressive power of RDF was more than adequate for representing the user models used in this thesis.

Using RDF lends some important benefits: ¦http://www.w3.org/RDF/ 2.5 The Java Programming Environment 43

• Using a standard format allows one to leverage parsing and manipulation tools de- veloped for the format.

• Models based on RDF are extensible, as assertions may be added to the graph with- out interfering with previous assertions. The use of namespaces allows previously unknown ontologies to be mixed without clashing.

• Other tools may generate and consume our user models. There has been recent work on layering reasoning on top of RDF on the web, thus creating a ‘semantic web’ [BLHL01]. It would be desirable for user modelling shells to integrate with any such development.

Future chapters will describe a design for a user model in RDF.

2.5 The Java Programming Environment

Most of the tools implemented for this thesis were written in the Java programming lan- guage, or in other languages but compiled to run on the Java Virtual Machine (JVM). Java was chosen as the only language and environment generally available that provided the ability to run programmes on heterogeneous systems, with some degree of efficiency and safety. Although browser plug-ins existed for some other interpreted languages, Java was chosen for the task. Here I shall give a short overview of the benefits offered to this work by Java.

The Java environment consists of three things:

• The Java language, A C-like Object Oriented language that can be compiled to Java ‘bytecodes’;

• The Java Virtual Machine (JVM), that executes Java bytecodes;

• The Java standard libraries, that provide a plethora of generally useful services to programmes executing within the JVM.

The JVM is, conceptually, an interpreter that allows any valid Java bytecode executable to run on any platform that has a JVM. In this respect it is similar to PYTHON,PERL, LIMBO,TCL,RUBY and LISP among others. However, unlike most of these, it is secure, which makes it particularly suited to running code gathered from the Internet. Code exe- cuted within a JVM runs within a secure environment, or sandbox. Any attempts to access secured system resources (disc, network etc) are governed by a security policy. This makes it safe to download ‘untrusted’ code from the Internet and run it on a local machine. As long as the security policy and implementation is correct, there is no danger of files being wiped or copied, or trojan activity by the untrusted code. This is important for visualising a user model. User model visualisation tools should not pose security risks to the user. 44 Background

Another advantage of Java over some programming environments is that the compiled code itself was designed for loading over a network. Classes may be loaded on-demand, and from any URL (within the bounds of the current security policy). The bytecode for- mat is optimised for size, so downloads are relatively fast. Furthermore, the Java standard libraries have strong networking capabilities. This made implementation of network func- tionality in the tools described in this thesis less difficult. Chapter 3

Domains and Models

This thesis explores the development and evaluation of a visualisation of large user mod- els containing 500 or more components. This is well beyond the number of components visualised in previous work. The visualisation was also generic enough to target at the two main classes of user model:

• Representation of the user’s knowledge. This is particularly relevant in Intelligent Tutoring Systems (ITS) [Sel94]

• Representation of user preferences, as in recommenders [res97].

It was decided that the design and refinement of VlUM should be informed by working in the context of two domains, one from each of the two classes of user modelling knowl- edge listed above. The representation of a user’s knowledge was taken from data collected by an online assessment system in a medical school. The representation of user preferences was gathered from data on the Internet Movie Database (IMDB)∗ web site.

Since these two domains were explored throughout this thesis, they may have influ- enced aspects of the design of VlUM. We now provide a comprehensive description of those domains both because of their influence on this work and because our evaluations will be described in terms of these domains.

3.1 Medical Knowledge

The representation of user knowledge was gathered from a graduate medical program at the University of Sydney. The University of Sydney Medical Program (UsydMP) [gmp] teaches a four year medical curriculum to an intake of graduate students. The curriculum is problem-based, consisting of seventy four weekly problems in the first two years of the course, and a more complex arrangement in the last two years. The course emphasises ∗http://www.imdb.com/

45 46 Domains and Models

clinical practice from the start, with students placed one day a week in teaching hospitals from the first week, where situated learning is possible. At regular intervals the students are placed in hospitals across the state of New South Wales for periods of three weeks or more. The use of the World Wide Web affords student access to teaching resources and facilities regardless of location.

3.1.1 Learning Topics

Each problem in the first two years is structured around eight to ten learning topics,or components of the curriculum that can be summarised in two pages (see Appendix A on page 139). All learning topics are available on the course web site.

There have been changes and trials in the medical program over the years, some of which have impacted on this thesis. One in particular I should bring to the readers attention was the ‘destructured learning topics’ trial.

The first two years of the course were designed mainly by defining some six hundred topics that were to be covered in those years, forming seventy four groups from them, and then constructing the weekly problems from these groups of topics. The process iterated, and at times worked in the opposite direction. When students first started doing the course, the learning topics that underlay the week’s problem were hidden until the end of the week, at which time they were revealed for the students to study specifically.

After a time some faculty felt that this strict association of topics to problems was not in the students’ best interest, and that they should have some say in which topics were most associated with the weekly problem. I† placed all topics in a single store, and implemented a search engine over them. The problem web pages were given a search dialog at the start of the week, and students were encouraged to search for learning topics that may have been of use in the problem. Each search was logged, along with the problem page from which the search originated. At the end of each problem week a list was generated of the eight topics which were deemed most useful by the students for that week. This was calculated by counting the number of topics which were viewed from the search page that had been called from the problem page for that week. Thus the previous hierarchical structure of the learning topics was ‘destructured’. After a time it was found that although the learning topic search mechanism was useful, the use of the search count to form the week’s topic list was not popular, and was dropped. This effected the work described in this thesis, in that early attempts to model the Online Assessment domain were influenced by the idea of ‘destructured’ learning topics.

3.1.2 Online Assessment

During the development phase of the University of Sydney Medical Program (UsydMP), some faculty members expressed concern about whether obsessively motivated students (as

†I was employed to build and support the GMP web site at the time. 3.1 Medical Knowledge 47

Figure 3.1: A multiple true/false question in Online Assessment medical students stereotypically are) would ever stop studying a learning topic in a Problem Based Learning course.

The solution was to provide a set of online questions for each learning topic. The questions are set by the author of the learning topic, and the standard of the questions supposedly reflects the expected standard of learning in that topic. This system has become by far the most used facility on the UsydMP web site.

This system was first implemented in python CGI scripts as a senior programming assignment by computer science undergraduates. Since then I have re-implemented the system three times as the underlying web development platform, and the requirements have changed. The latest version is written as a Java servlet, with question and user model data stored in a sybase database.

There are two major aspects to the online assessment system, the authoring and test interfaces. The test interface is the one directly relevant here.

Students using the online assessment system start by selecting which of the weekly problems they wish to revise. The system then selects a set of questions, which are dis- played as in Figure 3.1. The student attempts to answer the questions, and then a mark and comment is given (Figure 3.2 on the next page).

Questions are offered in one of 5 forms.

• Single true/false, consisting of a question, a correct answer, and a response given to 48 Domains and Models

Figure 3.2: Answer in Online Assessment

Figure 3.3: Question Statistics in the Online Assessment. 3.1 Medical Knowledge 49

Figure 3.4: Feedback in the Online Assessment. The image shows the feedback submission form.

the student once the question has been answered. This further explains the question.

• Multiple true/false, which is similar to single true/false but with multiple true/false sub-questions, each of which contributes a percentage of the total mark for the ques- tion.

• Multiple Choice, in which the student selects a single answer from a list of possible answers.

• Multiple Answer, in which the student selects more than one correct answer from a list.

• A-Because-B, in which there are three sub-questions: Two true/false questions, and a last question of the form ‘the first question implies the second, true or false?’. 50 Domains and Models

Figure 3.5: Feedback in the Online Assessment. The image shows the previous feedback page. The feedback author or the question author can delete the comment.

Figure 3.6: End of Session Statistics in the Online Assessment. Questions are categorised into subject areas. 3.1 Medical Knowledge 51

At the answer page, students may ask for performance statistics on that question (Fig- ure 3.3 on page 48) or submit feedback to the question author (Figure 3.4). They may also examine older feedback on that question that has not yet been deleted by either the question author or the feedback author (Figure 3.5 on the preceding page). When the student has completed the requested number of questions, or when they choose the quit option on a question answer page, they are taken to a page of broad statistics (Figure 3.6 on the facing page) that summarises their performance in the last session and across all sessions, and similarly their performance broken down by broad subject classifications.

To provide useful feedback to students about their progress, all answers are stored. Some effort was expended to convince students that this data would never be used for any type of summative assessment, and informal surveys (see Appendix B) suggest that students now use the system freely, unafraid of failure, and find this a useful gauge to their learning. We therefore consider the data gathered to be a reliable indication of what the student knows because their interests are best served by being honest about their knowledge, and they seem to realise this. It also means that the students have placed some trust in the system, and it is up to the faculty and postgraduate researchers not to abuse that trust. A direct example in this case is that we could not show students or faculty the model of any other individual. Faculty are limited by respect for students privacy to seeing only cohort averages.

3.1.3 The Generated Online Assessment Model

In VlUM terms, the learning topics are components, and the marks in the online assessment for a topic are belief values with a given certainty.

Marks

The model was generated from the database of online assessment answers. The score for a topic was calculated by taking the average for all the student’s answers for questions in that topic. I also calculated the same average for the student’s entire cohort, and, in VlUM 2.0 for the student one month before the date the model was generated. I added a ‘curriculum’ score, being the score the staff might wish of students at that stage in the course. The curriculum score was calculated by taking the score to be 70% if the problem had been seen by the students in the course, or 30% if it had not. It was felt that these values reflected the faculty expectations of these situations.

Certainty of each score was calculated as the percentage of marks in that learning topic that were answered by the student or cohort. This was straightforward for the user, the user one month ago, and the cohort average scores. The certainty of the curriculum score was taken to always be 100%. For example, if a learning topic had 10 questions, of which the student had answered 8, the certainty for the student score in that learning topic would be 8 × = 10 100 80%. 52 Domains and Models

The methods used for calculating the model are simplistic, but enough to evaluate the infrastructure and interface. Also, the amount of data available for the model was sub- stantial, enabling a simple model to still be quite valid. However, a more realistic model would at least have taken account of the time since questions were answered and resolved conflicting answers. There are toolkits such as UM [Kay99, KP95] available for building sophisticated models.

Peering Process

The components in a VlUM user model are linked to form a graph. The peering of the learning topic resources to create the graph for the model became a hard problem because initial attempts to base the graph on attributes of the online assessment questions were stymied by bad data.

An early idea stemmed from the work done on destructured learning topics (See sub- section 3.1 on page 45). Topics that were popular in the same problem week were peered. It was expected that topics would appear in more than one week and we would end up with some interconnection between the weeks that formed a graph. Unfortunately the trial of destructured learning topics was dropped before this assumption could be tested.

The initial implementation tried to generate a graph based Appearances Count on similarity of the keywords associated with each learning 1 2680 topic. Unfortunately, it was found that the learning topic key- 2 392 words had been chosen by the authors without reference to a 3 108 standard medical set, nor with thought to usefulness in dis- 4 53 criminating between topics. A simple script was written that 5 11 took the keywords from the topics, sorted and then counted 6 7 them. I found 3263 distinct keywords in 526 learning topics. 7 6 The breakdown of how often keywords were reused is shown 8 2 in the table on the right. As can be seen, by far the majority of 9 2 keywords were only ever used once, so forming a graph based 10 1 on their use is impractical. It seems that keywords tended to - - be invented by topic authors with little reference to defined or 14 1 de-facto ontologies.

There are internationally accepted metadata and classification schemes for medical do- mains [mes01, icd92], and the faculty agree that in future adherence to such a scheme would be a good idea, and reclassification would be good in principle. At the time of the writing however, such a major task does not seem to be a priority for the faculty.

Another attempt was made for the second version of VlUM, this time on the alternative classification of questions. Each question in the database is classified into one of eleven subject areas such as Abnormal function and mechanisms of disease. A graph was gen- erated based on the similarity of the subject classifications of the topic’s questions. For instance, if topic Anatomy of the venous system in leg contained 4 questions in the category 3.1 Medical Knowledge 53

Anatomy & Histology, and the topic Anatomy of the hip had three questions in the same category, then it was assumed the topics were related in some way.

Some tuning was used to set thresholds of how many similarly categorised questions constituted a match. In spite of considerable experimentation, no tuning could get around the problem that the questions were badly misclassified into too few categories. In partic- ular too many questions were misclassified in the category used as a default value when a question is created, Normal and abnormal structure. All this led to a very uneven graph, which in turn made navigation difficult. This was the problem that surfaced in the user tests explained in Section 4.4.1 on page 71.

These attempts showed that the metadata for the learning topics used by the GMP was not in a state to be used effectively in VlUM . Instead, the final graph was based on similarity in the names of the learning topics. If a topic had a name that had no great similarity to many others, the graph was filled out at random. This method had the advantage that the graph could make sense to students (in that, for instance, Structure of the Foot,and Structure of the Hand would be related, as they probably should be). The specific algorithm for matching names was simply to

• Split the name into words

• Sort from longest word to shortest

• Remove stop words (and, the, if, of)

• Get a list of other topics with one or more of the same word

• The next link is taken at random from that list, or at random from the list of all topics if that list is empty.

• repeat until the graph is filled.

The graph was then further filled by generating random links between under-linked com- ponents. This produced what seemed like a navigable graph, with some obvious basis for many of the links. Of course a more correct solution would be to have the faculty design a useful ontology, and re-categorise the topics. This was beyond the scope of this thesis.

There were some limitations in the model generated for the Online Assessment domain. The implementation of Learning Topics both within the GMP web site and the Online As- sessment servlet did not allow the user model to hold the URL for these in the component. The URL was, instead, fabricated from the unique ID for that topic in the Online Assess- ment database. Since this linking was not needed for the large scale user tests, I did not undertake to re-implement the GMP web site to allow it. 54 Domains and Models

Figure 3.7: IMDB entry for Roman Holiday

3.2 The Movie Preferences Domain

The Internet Movie Database (IMDB)‡ stores data on thousands of movies. All this in- formation is available in the form of HTML pages accessed via HTTP. The information is also available as text files but these files do not include enough information to immediately deduce the URL for the movie on the IMDB web site. Since I required this URL, I chose to gather the data from the web site itself.

There are two types of pages on the IMDB web site that concern us here: one for movies, and another for actors. A page for a movie will show the title, year, writer, pro- ducer, cast, a categorisation from a list of twenty genres, and reviews from the public. Such a page may be seen in Figure 3.7. The cast, producer and writer information contains links to a page for each person, who each have a page like the one shown in Figure 3.8. The ‡http://www.imdb.com/ 3.2 The Movie Preferences Domain 55

Figure 3.8: IMDB entry for Audrey Hepburn

IMDB also includes a rating system in which movie may be rated by visitors, and the cur- rent average rating for the movie (out of ten) is given on the movie page along with the number of points that informed the calculation. This gives an average popularity of the movie within a large population. The page for a person (whether actor, writer, producer or director) contains a list of movies the person has been associated with, with a link to each movie.

I modelled preference information on movies. In this domain I did not have any true model of a user, so I used the rating from the IMDB. In the parlance of the previous section, here VlUM components are movies, and belief values are the rating of the movie. The certainty of the concept was taken to be the log of the number of votes for the movie log(votes) divided by the log of the maximum number of votes for a movie, i.e. log(maxVotes) . This gave a maximum certainty of 1 for the most popular movie, and a theoretical minimum certainty of 0. Since popular movies received very many more votes than more obscure 56 Domains and Models

ones, a linear relationship did not lead to a good separation between the more obscure movies, so the logarithmic relationship was used.

I wrote a web crawling programme that pulled pages from the IMDB. It started with a movie, extracted data from the page, and then followed links to the pages for the cast, from which more movies were linked. In this way a graph of movie titles, linked by similarity in cast could be determined.

The precise algorithm followed by this program was

• The page for the first movie in a queue is pulled from the IMDB. The queue is initialised with Roman Holiday. If we have already retrieved the required number of movies (in this instance we retrieved seven hundred), we stop.

• Each page was parsed to extract information on title, cast, year and rating informa- tion.

• For each of the first seven people found on the movie page (which usually includes the writer, producer, director and lead roles) I retrieve the page for that person.

• The page for the person retrieved in the previous step is parsed to extract the movies they have worked on, store the relations between actors and movies, and add the movies to the queue of movies to be retrieved.

• The process is repeated with the next movie on the queue. This gives us a breadth first search of the graph formed by the movies and actors.

A second programme is then used to take the data generated above, find the strongest links between movies, and dump the found graph to a file as seen in Figure 5.12 on page 88. The full description of the final version of the data format used can be found in Section 5.2 on page 86.

The strongest links were calculated by sorting on the number of links between any two movies. For instance if two movies had the same writer, director, and senior cast member, then they have a link of strength three, which is used in preference to a movie which only shares a cast member. The six strongest links for each movie are used.

Because the data set used to generate this graph came by searching the IMDB in a breadth first manner, the graph so generated tends to be strongly connected around movies close to the starter movie, but less well connected towards what might be termed the ‘out- side’ of the graph. This is not homogeneous enough for our purposes. Movies on the outside of the graph may end up with too few peers. If a movie is weakly linked and dealt with late enough in the process for any possible peers to already have the maximum number of peers themselves, the algorithm will not be able to join it to the graph. It is also possi- ble for ‘islands’ to develop through a similar process. A similar problem was observed by Munszner [Mun00] in the H3 network visualisation. In that work an inhomogeneous graph could end up looking like the one shown in Figure 3.9 on the next page. 3.2 The Movie Preferences Domain 57

Figure 3.9: Munszner’s H3 system when displaying an inhomogeneous network. Image courtesy of T. Munszner.

As a simple solution to these problems, I filled the graph with additional links on a later pass. On the second pass I added links to underlinked movies based on the similarity of the title with the titles of other underlinked movies. A third pass filled the remaining links by randomly selecting ‘under-linked’ movies and connecting them. These methods become increasingly less valid. An implementation that used real data for a real user would probably wish to use a more formal method.

The movies were written to the RDF file in order of year, so the oldest movies appear at the top of the screen.

Chapter 4

Design Constraints and Early Design Experiments

Various versions of VlUM have been built and tested. Early attempts explored ways of displaying the graph, as well as different file formats for the user model itself. These trials were user tested and critiqued technically, and then redesigned and re-implemented. This chapter will describe this process in detail.

I shall start by outlining the constraints under which these systems were built. I shall then describe the evolution of the tool. Each evolutionary step involves a redesign of the interface and model format, and some new implementation approaches.

4.1 Design Constraints for a Visualisation of Large User Models

There are several requirements for visualising any large user model. As already described in the introduction, there is a number of questions a user may wish to ask. These include:

• What might I be interested in, or doing well in?

• What might I not be interested in, or not doing well in?

• The threshold for deciding what is ‘interesting’ or ‘good’ should be adjustable.

• How strongly the model holds each belief?

• How are the model components related?

• Why does the model believe this?

59 60 Design Constraints and Early Design Experiments

Thus, as discussed in Section 2.1.2 on page 14, the tool must show an overview of the model components, showing the certainty and value for each component. It must also show the relationships between the components, their relevance in any current context, and the component types. It must allow navigation around the model, and finally, it should be able to show more detail about a particular component, possibly by using an associated tool. For instance, VlUM could use the tools in UM [Kay99] to show more detailed information about the inferences associated with a particular component when that component is selected.

There were further requirements and constraints in designing VlUM. It was to be pos- sible to use VlUM within the GMP web site, at a variety of locations. These extra design constraints were:

A generalised user model format The model was to be sent over the network, and so some format was required. This format needed to be general enough to encode the models from the two domains, medical knowledge (Section 3.1 on page 45) and movies (Section 3.2 on page 54).

Adjustable ‘Category Boundary’ I wanted a student to be able to find topics in which they were doing well, or badly. In this case, a student should be able to set their preferred standard for ‘well’ or ‘badly’, and VlUM needed to respect that standard. Thus the boundry between the ‘good’ and ‘bad’ categories needs to be adjustable.

Usable with much more than 100 items VlUM needed to display all the learning topics in the UsydMP, or over 540 items.

Usable on an average web browser I wanted to run experiments in the usual computing environment of the target audience. I took this audience to be students in the univer- sity medical program, which provided workstations with 200Mhz Pentium Pro pro- cessors, 32Mb RAM. They were running Windows NT 4.0, a Netscape 4.x browser and had 17” monitors with 1024×768 pixels. This forced me to implement in Java. Since Java on these machines used a primitive just-in-time compiler, VlUM had to run efficiently.

Usable over slow networks Students were expected to be able to use VlUM at home or in regional hospitals. These machines were often at the end of a modem or overloaded ISDN link. Since the model was prepared and stored on the main server, latency and bandwidth were major design constraints.

Usable in a small space VlUM is, at one level, a navigation tool, and so it made sense to be able to show the currently ‘interesting’ topic on a web page at the same time as the VlUM display. This required fitting the VlUM display in a 1024x768 pixel region beside a full web page.

Unlike the work by Kay [Kay99], the user models used in this thesis only had one type of component, and so no attempt was made to show the component type in the overview. 4.2 Description of a VlUM Model 61

4.2 Description of a VlUM Model

VlUM consumes a user model consisting of components. Each component modelled refers to related components, and so the model as a whole is a graph. Each component also contains a name, a belief value (or score) and the certainty for that belief. For example, in a movie recommendation service, the components are the movies to be recommended. The movies are related in some way, like similarity in cast or genre. Alternatively, in a model of a domain being taught, the components are the ‘bits’ of knowl- edge to be learned. In a medical course, these components might be things like Structure of the Gastro-Intestinal Tract or Taking a nutritional history, and are sometimes known as learning topics, or simply topics. In all implementations in this thesis, components in the model are ordered. This is implicit in the file format, in that components are simply listed in an order. However this ordering can carry important information, and most of the implementations of the VlUM visualisation have used this ordering. For example, in the movie domain the movies are listed by date of publication. Similarly the Online Assessment topics are listed by date of release to students. Any non-serial file format for the model would need some method of encoding a linear ordering of components.

4.2.1 The Component Data

Each component in a VlUM model may contain data about the user preferences or perfor- mance for that component. The data has both a degree, which may be a mark or a rating, and a certainty that indicates how strongly the mark is to be believed, or how much data supports the mark. Both of these values may range between zero and one inclusive. In the movie recommendation domain, the score would be how strongly the user is assumed to like, or hate, the movie. The certainty would be how strongly the model holds the belief. So a movie might be strongly and certainly recommended, or it might be recorded that the model simply doesn’t know what the user would think about the movie. There may be more than one data set associated with the model. For instance, in the Online Assessment domain we have data for the user’s average, the cohort’s average, and others. In the second implementation and on, VlUM itself could provide synthetic data sets from these by comparing one against another. An example of this, again from the assessment domain, is the comparison between the user’s and the cohort’s scores generated by subtracting the cohort score from the user performance value for each component, and re-normalising. Reliabilities are combined by multiplying the reliabilities of the two data sets for the concept. For example, if in the online assessment domain we have a concept with an individual user average of 0.85 and certainty of 0.7, and a cohort average of 0.75 and certainty of 0.8, then we would momentarily calculate the synthetic data for that concept to have a value of 0.85 − 0.75 = 0.1 with a certainty of 0.8 × 0.7 = 0.5. This follows normal procedure for combining error bar values. 62 Design Constraints and Early Design Experiments

4.2.2 The Graph

Most user modelling shells store the model as a set of atomic components, with some also adding an inheritance hierarchy. However, graphs of relationships are important for structuring a large collection of components so that they may be visualised and navigated. In fact, most user models have a number of both explicit and implicit associations between the components. Some examples are:

The chain of inference If the model believes β because α → β and it already believes α then the user may want to see that association.

Genetic epistemology graphs If a learner model in an Intelligent Teaching System is based on an overlay of a graph of components, then that graph relates the compo- nents.

Natural relationships Some domains have natural relationships between the components, particularly where the components are ‘real’. For instance, in the movie domain movies are related by director, cast, writer, and so forth. Books could be related by author, genre and topic. A final example could be academic papers, which are related by citation.

Synthetic relationships Books may have natural relationships, but a book recommenda- tion service such as the one offered by AMAZON.COM may create other relationships. For instance, two books may be related because people who buy one also tend to buy the other.

In the VlUM model format, the components exist independently, but may be connected into a graph as the domain dictates. Any component may be related to any other component. The relationships may be one way (directed), but in the models in this thesis components are connected are linked in both directions, and so the graph is not directed.

4.3 Version One

The design constraints mentioned above immediately eliminated some solutions. Three Dimensional displays were not possible, because although some solutions for 3D program- ming in Java were becoming available, the workstations in use did not have the required 3D acceleration hardware. Large graph layouts, as shown in Appendix I on page 229, were eliminated by the small screen space available and interactivity requirements. Techniques for making cluttered displays more easily read, such as anti-aliasing and transparency, were also too CPU intensive.

The long network latency limited how often VlUM could interact with the server. In fact, all the implementations download a single file at startup time. This contains all the information needed for interactivity. This has been adequate so far, although a full user model file is around 500 Kilobytes, making startup a little slow. 4.3 Version One 63

I modelled the initial design on the vertical component listing in DEXTER (see Sec- tion 2.2.3 on page 25). DEXTER displays a graph of some 100 items in a small space, with good interactivity. It does not, however, display all the information we wished to show. Nor does it display the number of items were aiming for. Even so, it seemed to offer some promising ideas as a basis for a visualisation which would meet our requirements. Unfortunately, there are no reported evaluations of its usability.

4.3.1 Display

A screen dump of the initial implementation of VlUM can be seen in Figure 4.1 on the next page. The tool was designed to be placed on one side of a browser window, allowing space for a web page to also be displayed. It consisted of a menu bar at the top, with menus DISPLAY and ACTION.DISPLAY allowed the user to select which model they wished to see, out of theirs, their cohorts, or a hypothetical ‘average student’. The ACTION menu offered the ability to show further information about the selected component in the adjacent web page. A slider could be used to set the point at which VlUM considered a component value to be ‘good’ or ‘bad’. This allowed the student to change what constituted a ‘fail’ to suit their expectations for themselves. Beside this was a label detailing the current model being shown, and the current value of the slider. Below this was the visualisation itself.

In the visualisation, user model components were arranged vertically, in the order spec- ified by the model file. Like DEXTER, the first version of VlUM moved the selected compo- nent to the top of the display to create a ‘reverse history’. The relationship of components to the selected component was shown by giving peers more space the closer they were re- lated to the selected component. So in Figure 4.1 on the following page, the component named Lymphatic system and the spread of cancer is the selected topic, and Structure of respiratory tract is one of the peers of the selected component.

I used hue to show the average mark for a topic by colouring the topic green for a good mark, and red for a bad mark. Yellow was used to colour a topic with no information. The degree to which the mark was bad or good was show by the saturation of the title colour. For instance, a topic in which the evidence points to the student having a good mark, will be coloured strongly green, like this title , while a topic in which the student is only doing moderately well in will look like this . Similarly, a topic in which we believe the student is doing very badly will appear as this while a topic the student is just not doing well in may appear as this . Topics about which we have no modelling information are shown yellow, like this . These components are left in the visualisation because a user may still wish to see them. The fact that the model has no information is itself an interesting fact. The component value that marked the point of change between red and green was set by the slider above. The colours could be changed (in the applet parameters in this implementation), so in the case of colour blindness more appropriate colours could be used. Ideally, this sort of adaptivity would be based on information in a user model.

In DEXTER when you click on a topic, the graph is redisplayed so that the selected and related topics are given more space. This necessarily clumps other topics together, often to 64 Design Constraints and Early Design Experiments

Figure 4.1: First implementation of the visualisation 4.3 Version One 65

Figure 4.2: Stretching in the first implementation the extent of causing the titles to overlap. Selection of these cluttered titles by mouse click becomes difficult. In VlUM where more topics are displayed in the same space, selection becomes impossible. In DEXTER this is not a problem, because at the time these topics are thought to be ‘uninteresting’. In VlUM, a student may be trying to find a topic which is ‘very red’, and sees a few pixels of red in a dense cluster of titles. Selection of the red topic is difficult. It would be helpful to allow the student to expand the region around the red pixels to be able to examine the contents more easily. In VlUM this is known as ‘stretching’, and an example may be seen in Figure 4.2. The most common usage of stretching is to pull apart two topics to stretch and expose the region between them. A user may ‘grab’ a topic by clicking it with the mouse button, and then ‘dragging’ it to a new position. VlUM redraws the graph with the dragged topic fixed at the position the user dragged it to. As many topics as desired may be dragged around in this manner. Topics remain fixed in their new position until

• a previously dragged and fixed topic has another topic dragged over it. In this case the first topic is unfixed and allowed to move again to maintain topic order.

• another topic is selected with a single mouse click. This releases all dragged topics.

4.3.2 Implementation

The first version of VlUM was implemented in Java version 1.1, and worked around some early bugs in the Netscape version 4.0 browser virtual machine. It did require the use of an early XML parser, and the SWING GUI library was used to show the slider control. This led to a download of more than 1.5 megabytes of Java libraries. This was unfortunate, but the primitive state of the Java version shipped with the Netscape browser forced me to include these extra packages. 66 Design Constraints and Early Design Experiments

75% 80% 78% GMP:average GMP:curriculum GMP:cohort

http://www.gmp.usyd.edu.au/topics/323.html

GMP:Keyword GMP:Keyword GMP:Name Pain Arteries Mechanisms of Pain Mechanisms of Pain Pain Arteries 0.75 0.80 0.78

Figure 4.3: RDF schema and serialisation of first implementation. It represents a learning topic (topic 323), with two keywords (Pain and Arteries). It has an average (GMP:average) mark of 80%. Notice the lack of explicit peers. Linking of resources was left to the viewer.

Model

VlUM consumes user models described in the Resource Description Format, described in Section 2.4.3 on page 38. The graph and serialised form used in the first implementation of VlUM can be seen in Figure 4.3. There were three reasons for using RDF:

• It naturally described the domain

• It is standards based, ensuring interoperability of user model servers and clients.

• There are good parsing packages available for XML, and RDF parsers and query interfaces are being actively developed.

The file consisted of a number of statements about a resource (which is a model com- ponent), which is identified by a URL to the resource. There are multiple resources in the model file. The statements for each resource were simply about attributes of the resource, such as its name, keywords associated with it, and the marks for the user, the users’ cohort, and the expected marks.

Influenced by the availability of keywords in the GMP Learning Topics, the first data model was based on keywords, which were used to generate a graph at the client. The 4.3 Version One 67

FunctionalNormal Weight anatomy of the GIT FuelMacro-nutrients use in cells

Figure 4.4: First implementation of selection. Functional Anatomy of the GIT would be selected because its base line is the first one within six pixels of the y position of the tip of the cursor. graph was constructed by taking the keywords for each topic, and finding other topics with a similar set of keywords.

Selection

Selection of topics in this version was accomplished by storing the vertical (y) position of each topic. The topmost topic within 6 pixels of the y position of a mouse click was selected, as shown in Figure 4.4.

Graph Layout

The model graph was generated by finding components with the same keywords. Any component was related to all components with the same keyword.

When a component is selected with the mouse in VlUM it is deemed selected. The graph then needs to be redrawn to reflect this. In VlUM, I took the position that showing the complete graph was not the most effective interface for navigation through the model. Instead, the user was more likely to be interested in the spanning tree through the graph rooted at the currently selected component, which would show the relationship of all com- ponents to the currently selected component. The algorithm for calculating the spanning tree was:

1. Take the root concept, mark it with the depth in the tree (zero), a flag to indicate that it has been visited, and add it to a FIFO queue.

2. Take a concept from the queue, find its depth δ and its peers in the graph.

3. For each of the peers, if the peer has not been visited, set its depth to δ + 1, flag it as visited, and add it to the queue. If the peer has been visited, but its depth is greater than δ + 1, we have found a shorter path to that concept, so we set its depth to δ + 1 and put it back on the queue.

4. If the queue is not empty, return to step 2.

VlUM shows the depth of a component in the tree, and hence its ‘degree of interest’ by giving more ‘interesting’ components more space. The space left around a component was calculated by taking the inverse of the depth in the tree multiplied by two. This was then 68 Design Constraints and Early Design Experiments

adjusted so that all components fit on the display. For example, assume the spanning tree has a depth of six, with the root given a depth of 0. One is added to the space for these calculations to ensure that components at the lowest depth still have some space. We thus give the root a relative space of (6 − 0) × 2 + 1 = 13, while the immediate peers of the root (depth 1) get a relative space of (6 − 1) × 2 + 1 = 11. Of course the components at the lowest level get a relative space of (6 − 6) × 2 + 1 = 1. This space is then renormalised to fit on the y axis of the display. The components are animated as described below into their new positions.

Animation

VisualWho [Don95] uses a system of imagined springs to position the items. In the Visu- alWho display, there are a number of anchor points, to which all items are connected with

springs of variable spring constant ki . It is possible to then update the positions of the items by simulating the physical system over a number of time increments. This physical sim- ulation was slow, and quite unnecessary since a body attached by known springs to fixed anchors has an easily calculable resting position. In the end Donath calculated this resting position and then implemented a stepping algorithm in terms of x(t)    2  b  − bt x(t) = x + (x − x ) cos t K e 2 rest 0 rest 4

where K is a spring constant and b is not explained in the paper. This gave a pleasing animation of the bodies slowing as they approached their resting position.

Another graph visualisation that uses physical simulation of a spring system for layout is the GraphLayout demonstration applet that has always come with the Java Development Kit from Sun Microsystems. This is a simple demonstration program that places n bodies on a plane at random, each body connected to two others by a spring. All springs are of equal length and have equal spring constants. The applet then calculates the movements of the bodies under the influence of the springs. The system usually finds a low energy resting position, and becomes still. Some other systems of this type find that the system never comes to rest, probably due to an interplay between the time increment chosen, the amount of friction in the system and initial configurations. In these cases the graph exhibits never ending harmonic oscillations, which can render them difficult to view. A good example of this is the ‘Stanford Social Web’ visualisation∗ which shows a ‘social network’ [AA01]. The visualisation, at the time of writing, never finds a rest state, and the constant movement makes viewing difficult.

There were some attempts to use a simulated spring system in VlUM. The visualisation was modelled as a set of bodies resting on a flat surface each connected to their neighbour by a spring, all springs of equal length, and the spring constant dependent upon the pair’s depth in the spanning tree generated on topic selection. However, unlike VisualWho, the ∗ http://negotiation.parc.xerox.com/web10/, June 2001 4.4 Version Two 69 bodies in VlUM were in general attached to other moving bodies (their neighbours), and so the final resting position of a particular body was not easily calculable. Instead the full iterative simulation was required. This was prohibitively slow as the number of bodies in the simulation grew.

The implementations that could show more than 100 Initial Position components all used a simple iterative algorithm for the animation, described by Intermediate Steps yrest − yt−1 y = y − + t t 1 2

where yt is the position on the y axis for a title at step t, yt−1 was the position of the title at time t − 1, and yrest is the position for the title calculated by the process given previously. Essentially, a final resting position is known, and at each iteration the title is moved half the remaining The Perfect Storm distance to that position. This process is repeated six times, Final Position at which point the titles are likely to be close to their final Figure 4.5: Animation of a positions. We then redraw the titles in their final positions. title in VlUM. This produces a smooth animation in which the titles ap- proach their final position increasingly slowly, as shown in Figure 4.5. The six iterations give them enough time to find their position that the final step of absolutely positioning the titles does not seem to cause a ‘jump’ the display. For small data sets the animation starts to take less than one second, which is too fast [RCM93]. Later implementations added some ‘governor’ code to slow down the animation if it was going to take much less than one second.

4.3.3 First Formative Evaluation

The first implementation was presented to six participants for informal feedback. Partici- pants consisted of students and staff of the medical faculty. This early version was tied to the Online Assessment domain described in Section 3.1 on page 45. The graph contained some 460 topics, and performance data from the model of a non medical student (myself).

Feedback indicated that topic reordering did give some history but removed a poten- tially more useful ordering along the y axis. The dragging feature also proved confusing because topics were often dragged one pixel, when the user simply meant to select. The user was left wondering why the selection didn’t work.

On the implementation side, graph construction proved inefficient. This, combined with the lack of structure in the keyword set used in the learning topics (See Section 3.1.3 on page 52), led to this peering method being dropped in the next version. 70 Design Constraints and Early Design Experiments

Figure 4.6: Second version of the visualisation.

Normal Weight FunctionalFuel use in anatomy cells of the GIT Macro-nutrients

Figure 4.7: Second implementation of selection. Functional Anatomy of the GIT would be selected because it is the last topic for which the y position of the cursor tip lies between the font ascent line and font baseline. 4.4 Version Two 71

4.4 Version Two

The next version introduced ways of fixing some of the problems found previously, as well as strengthening underpinnings. The re-ordering of components on selection was removed. This had the effect of removing the history facility, but allowed the implicit ordering of components within the model file to carry information. In this domain, learning topics were listed in the order they were presented to the students.

In this version, font size was used to emphasise the tree within the graph. A minimum and maximum font size were selected, and the depth of the component within the spanning tree used to calculate a font size somewhere within this range. Additionally, the vertical discrimination of the selection algorithm was increased by storing the vertical size of the topic title and resolving ambiguity of overlapping topics by picking the last topic over which the cursor actually lies, as shown in Figure 4.7 on the facing page.

At this stage, I also began experimentation in providing the ability to compare models. This allowed users to compare their model against the cohort and ‘curriculum’ models, as described in Section 4.2.2 on page 62.

The RDF graph was improved as well (Figure 4.8 on the following page), moving away from keyword based peering towards an arbitrary graph decided by an arbitrary external source. In this case peering was based on questions being in similar problem groups, as discussed in Section 3.1.3 on page 52. Each resource description contained references to other resources in the model. The VlUM client built the model graph from these references.

The evaluation, described in detail in Section 4.4.1, revealed that the display could be- come ‘cluttered’ when navigating through some parts of the graph, confusing the subjects. This confusion was mitigated for the third subject by shifting ‘uninteresting’ yellow topics to a ‘Shift Line’ about one sixth of the width of the display to the right, as shown in Fig- ure 4.9 on the next page. This introduction of the x axis into the visualisation allowed me to add a ‘Differential Selection’ mechanism in which a mouse click or mouse over to the left of the ‘Shift Line’ to preferentially select a topic for which there is data, while a mouse event to the right of the line preferentially selected an ‘uninteresting’ topic, as shown in Figure 4.10 on page 73.

4.4.1 Second Evaluation

The evaluation was based on a think-aloud experiment, where a student was asked to articu- late their thoughts while performing tasks with the system. Only two subjects were studied before at least one major problem with the visualisation was found. We then implemented a workaround for that problem involving shifting some topics and turned off the dragging mechanism, before a third subject attempted the same tasks.

The test started with a general introduction to the idea of user models and what we are trying to achieve. A hands-on tutorial on the system was then given, in which the students were shown how to select components, move obscuring components out of the way (in the 72 Design Constraints and Early Design Experiments

Mechanisms of Pain 75%

80% GMP:Name GMP:curriculum 78%

GMP:average GMP:cohort

http://www.gmp.usyd.edu.au/topics/323.html

GMP:Keyword GMP:Peer GMP:Keyword

http://www.gmp.usyd.edu.au/topics/342.html Pain Arteries

GMP:Peer

http://www.gmp.usyd.edu.au/topics/445.html

Mechanisms of Pain

Pain

Arteries

0.75

0.80

0.78

http://www.gmp.usyd.edu.au/topics/342.html

http://www.gmp.usyd.edu.au/topics/445.html

Figure 4.8: RDF schema and serialisation of second implementation. Peers were added, although at this stage they were not true RDF resource references.

Figure 4.9: The model visualisation after clicking on a component with many or no peers. A fix, shown on the right, is to shift the components for which there is no data to the right. Font size of these shifted component was also reduced somewhat. 4.4 Version Two 73

(A) (A)

FunctionalFuel use anatomy in cells of the GIT FunctionalFuel use anatomy in cells of the GIT

Figure 4.10: Third implementation of selection. The line labelled (A) indicates the point at which components for which there is no evidence is placed. Components for which there is evidence go some distance to the left of this line and selection ambiguity is resolved by selecting the component that is on the same side of the line as the cursor, so in the left figure, Functional Anatomy of the GIT would be selected, but in the right figure it would be Fuel use in Cells.

first two students), select views, and set the pass mark. The subjects were then asked to find a topic in which they were

• doing badly

• doing well

• doing worse than most of their class.

The users in the experiment were students in the first two years of the medical pro- gramme. All students performed this experiment with the same user model which had been built from data for a randomly selected student. This student seemed to be fairly typical of the class.

Results

Before discussing the user’s performance on the set tasks, we give an overview of our observations of their use of the visualisation interface. The first student (S1) had problems with selection, tending to drag components rather than select them, leading to confusion about what clicking on a component really did. The slider was understood after a few seconds of conversation. It was demonstrated by setting it to 17%, which caused S1 to think they were doing very well for a moment (most topics turned green), but they understood a few seconds later. S1 then clicked on a component that was related to just about every other component, and the display became cluttered with yellow (unknown) components, as seen in the left image in Figure 4.9 on the preceding page.This made identification and selection of the desired components difficult, and the test was stopped. The second student (S2) did not encounter a cluttered display during the introduction, and had fewer problems with selection. The slider was demonstrated and understood. The third student (S3) encountered a cluttered display, but the shifted yellow compo- nents helped them avoid the inability to select desired components encountered by the oth- ers. However, they encountered some difficulty understanding the ‘differential selection’ interface used in the ‘shifted’ version, and had difficulty with selection. 74 Design Constraints and Early Design Experiments

We now summarise user performance on the actual tasks

First Task

The students were to find a component in which they were doing badly.

S1 never got this far, having given up after encountering a cluttered display.

The second student (S2) had fewer problems with selection. They understood the colour scheme, readily identified a component in which they was doing badly and selected it. As already noted, they never encountered a cluttered display.

The third student (S3) experimented a little more than the second, but the shifted yellow components helped in rescuing them from cluttered displays. They identified and selected a component in which they were doing badly.

Second Task

The students were to find a topic in which they were doing well.

S2 found such a component with no difficulty. As did S3.

Third Task

The students were to find a topic in which they were doing worse than most of their class.

S2 managed to get to the comparison between their own and the cohort’s marks. At this point the slider had been set to 67%, meaning that the only green components were ones in which the modelled student was doing 67% better than the class average, which didn’t leave many. This confused S2, although it was cleared up with some conversation, and S2 was able to complete the task. It should be added that soon after this, S2 encountered a cluttered display and stopped experimenting.

S3 was able to find the comparison view, and adjust the slider to a low value and pick a component that the task required.

Conclusions

A goal for the VlUM visualisation is the provision of an overview of large, poorly structured user models. This evaluation indicates that the approach has both promise and problems.

The problems identified are:

• the display becomes unreadable if too many components are in focus, and it is im- possible to find or accurately select desired components without extensive use of the dragging mechanism. 4.4 Version Two 75

• the more complex selection mechanism in the ‘shifted’ version seems difficult to understand.

• there is a problem with users dragging instead of selecting.

• the slider tends to be set to confusing values when the user chooses a new view.

The problem with the slider could be fixed by setting the slider to a middle value as a rule. In this implementation it was simply set to the closest possible value to its position in the last view, when in fact there is no useful relation between that last position and its position in the new view.

The problem with users dragging topics instead of selecting could be helped with more appropriate feedback about the action, and also a drag threshold. In the third test we re- moved this option altogether, although an implementation of this existed.

The main problem was the tendency of the display to obscure information by trying to show too many related nodes. This obscures other useful information, and makes selection of obscured nodes difficult. The shifting of ‘uninteresting’ topics seems to at least keep the topics that are most likely to be interesting selectable, although visibility is still somewhat limited if they are obscuring each other.

The topic selection problem on cluttered displays was partly solved by shifting the yellow topics, but at the cost of complicating the selection mechanism.

These problems were all exacerbated by the use of an uneven graph in the first place. Unfortunately the domain as it stands does tend to lead to uneven graphs, largely from bad categorisation of online assessment questions. The structure of the graph greatly affects its navigability, because the visualisation seems to work best on homogeneous trees with at most about ten children for each node down to a depth of three. Too many children at these depths lead to too much information displayed, and a cluttered display. Conversely, too few children can lead to a ‘dead end’ in the exploration of the data. Some methods for mitigating this were explored in the next implementation.

Chapter 5

VlUM 2.0

The final implementation addressed some problems described in the previous chapter. De- tails of these changes are covered in this chapter, but to summarise:

• The slider range problems were fixed by resetting the slider to a central point when the view was changed.

• The cluttering of the display when a large ‘cluster’ of related components was ex- posed was fixed by more careful preparation of the graphs.

• The use of the x axis of the display was extended to show a new dimension of the data. It now shows the certainty of the score for that component.

• In the first experiment the ‘dragging’ facility was turned off to remove the possibility of participants dragging instead of clicking on a component.

• The logic used for detecting which topic was selected by a mouse click was rethought and re-implemented.

• The RDF schema was changed to accommodate the new certainty measurement, as well as to more closely conform to the spirit of the RDF specification.

• Facilities for running evaluation experiments, including a quiz system and logging facilities, were added.

5.1 Appearances

The following section describes the elements of the VlUM 2.0 display, and, where relevant, how they have changed from earlier versions. Figure 5.1 on the following page shows the display running with data from the Online Assessment data set. The VlUM 2.0 tool can be seen running in a browser, showing the selected component (a movie) in Figure 5.2 on page 79.

77 78 VlUM 2.0

Figure 5.1: The visualisation in VlUM 2.0. 5.1 Appearances 79

Figure 5.2: VlUM 2.0 in a web browser, showing the currently selected movie.

5.1.1 Menu Bar

Display

The Display menu at the top left of the display allows the student to choose between views. The menu as configured in the online assessment model can be seen in Figure 5.3 on the following page. In the online assessment model, the default view gives the user’s model, with red-green indicating performance according to the standard set by the user. It is also possible to see a cohort view which shows the same information, but this time showing the performance of the average of the cohort. In addition, the user can select one of the the ‘difference’ views which show the individual compared against the cohort, the faculty expectations, or against themselves one month previously. In these cases, green indicates the individual is doing better than the comparison data set and red that they are doing worse. The Display menu is disabled if there is only one data set, as is the case for the IMDB data.

Action

The Action menu allows users to view the component web page in an adjacent frame, or view the evidence for the value of the selected component, as in Figure 5.4 on page 81. Users can also search through the list of components by name with a search dialog. This is an important adjunct to the visualisation display. Consider, for example, the case where a student has learning topic Management of pneumonia in mind but cannot find it on the 80 VlUM 2.0

Figure 5.3: VlUM 2.0 in a web browser, showing the DISPLAY menu for the Online As- sessment model. The Learning Topic shown is mocked up.

visualisation display. This menu allows them to search for it, making Management of pneumonia the currently selected component in the display.

This menu also had a ‘back’ option, that could be used to take the user back through the session. Selecting ‘back’ would re-select the previously selected component. A list of previously selected components existed for the entire browsing session, so it was possible to revisit the entire session through repeated use of the ‘back’ menu option.

Help

The Help menu has no contents, but, when clicked, shows a help text in a browser window. The help text explains the basic operation of VlUM 2.0.

5.1.2 Slider

As before a slider above the main pane can be used to set the separation point between red and green (i.e. the ‘category boundary’, or in this context perhaps a ‘pass mark’). This can be used to maintain the spirit of problem-based learning where students accept responsibility for monitoring and planning their own learning. The slider allows students to set the standard they require of themselves. Figure 5.5 on page 82 shows two instances 5.1 Appearances 81

Figure 5.4: VlUM 2.0 in a web browser, showing evidence for the currently selected Learn- ing Topic. The evidence in this image is mocked up. of the same view, but with the pass mark set to 10% and 90%. The relative amount of red and green changes accordingly.

The need for the slider is not limited to problem based learning domains. As explained in Section 1.1.2 on page 8, the notion of ‘truth’ can be relative in a user model with data contributed and used by a multitude of tools and consumers. It is imperative that a user model consumer have some way of adjusting the standard for assessing the data in a user model. In the movies domain explored here, it enables the user to define the standard for judging a movie as interesting

5.1.3 The Display

When viewing a user model, the user typically wishes to view related information at the same time. For instance, in the movies domain, the user model may seem to be recom- mending a particular movie. It would be useful to be able to show information about that movie at the same time. Similarly, the user may wish to see an explanation of the models belief value for a component while still viewing the overview. Thus the VlUM visualisation exists in a vertical segment of the screen, leaving room for a web page to be displayed in full to its right.

The VlUM 2.0 display, shown in Figure 5.1 on page 78, exists in a pane of about 82 VlUM 2.0

Figure 5.5: Comparison of the graph shown on the left of Figure 5.1 on page 78 with pass mark set to 10% on the left, and 90% on the right.

350×600 pixels. For example, the image in Figure 5.1 shows that the selected compo- nent is Effect of infection on pregnancy (role of placenta), in which the student is not doing well at all.

If a user is given VlUM 2.0 in the state shown in Figure 5.1, and clicks on Effect of virus on host cells which is slightly above Effect of infection on pregnancy (role of pla- centa), VlUM 2.0 will change to the state shown in the right image of Figure 5.6 on the next page. The spanning tree has been recalculated to place Effect of virus on host cells at the root, and Effect of infection on pregnancy (role of placenta) at a lower depth. Since the distance between the topics in the graph has not changed, the sizes of the two components are essentially swapped. However, the relative positions of the components on the display will change to reflect the new spanning tree. Titles are not fixed to any particular posi- tion, but move as the spanning tree causes ‘warping’ of the display surface, but as already mentioned, the components are always displayed in the order they appear in the model file. 5.1 Appearances 83

Figure 5.6: Two examples of the model visualisation. The viewer starts as seen on the left. Clicking on the topic Effect of virus on host cells moves it to the view seen on the right.

Figure 5.7: The component under the mouse is Effect of virus on host cells. A component becomes white when the mouse pointer is moved over it. 84 VlUM 2.0

Normal Weight

Figure 5.8: The implementation of component selection in VlUM 2.0. At the top left we see the component within its bounding box. The other three diagrams show three overlapping components, represented by their bounding boxes. In each diagram the selected component is the one that contains the mouse pointer and that has a vertical boundary nearest the pointer tip.

Figure 5.9: The selection algorithm in VlUM 2.0 selects the component with an end closest to the mouse pointer. In the two diagrams above different components are selected because the mouse pointer has been moved horizontally, and thus changes which component has a closer end. The dotted line shows that the pointer is in the same vertical position. This method works well for disambiguating selection of components if the user selects near the end of the component they are interested in. It may cause confusion for other strategies.

5.1.4 Selection

The final implementation of selection uses the bounding box of the drawn component as the selection region. If there is more than one bounding box under the mouse pointer tip, the nearest vertical boundary of a region the mouse is in is found and that topic chosen, as shown in Figure 5.8.

This method of resolving the ambiguity in selecting overlapping components is based on an assumption that a user, when attempting to select a component in a cluttered re- gion, will tend to focus on selecting at an end of the component. This assumption was based mainly on observation of my own strategies, and has not been tested. However, the selection method does work better than earlier methods to allow accurate selection of com- ponents in cluttered areas. The only drawback of the method is that different components are chosen by the algorithm based on the horizontal position of the mouse pointer, and therefore the highlighted component can change with only a horizontal move of the mouse as shown in Figure 5.9. This could lead to some confusion in particularly observant users, but I have not observed any problems in practice.

As the mouse is moved around the display, the topic that would be selected at that point is highlighted by painting it white, as seen in Figure 5.7 on the preceding page. Also, information about the highlighted component is displayed in the status bar, as explained in 5.1 Appearances 85 the next section.

5.1.5 Status Bar

As can be seen in Figure 5.1 on page 78, the last element of the VlUM 2.0 display is the status bar. This is a standard interface element in most graphical tools. In the figure, VlUM 2.0 is shown running in the Java Appletviewer tool, but a status bar exists in most web browsers as well. VlUM 2.0 uses the status bar to show the name, score and recommenda- tion of the component the mouse is currently over in the VlUM 2.0 display. For instance, if the mouse pointer was resting on Roman Holiday, the status bar would show

Roman Holiday score 80.0% certainty 76.4539%

As one participant pointed out, no effort was made to round the score and certainty values to reasonable precision. This was a small oversight on my part. Also, the status information shown in Figure 5.1 is truncated. This is due to the narrow window of the appletviewer in which VlUM 2.0 is seen in that figure. In a real web browser the status information line is longer.

The text shown by VlUM 2.0 in the status bar is particularly important because it is the only place that absolute figures for score and reliability are given. If a user is trying to decide between two concepts that have very nearly the same scores, they will probably have to use the information in the status bar to find the right one.

5.1.6 Experiments in Anti-Aliasing and Transparency

Anti-Aliasing shades pixels on the edge of a diagonal line on the display to make the line appear less ‘pixelated’. This can improve the readability of text, and is used in packages like ADOBE ACROBAT for this purpose. I used the anti-aliasing capabilities of the Java2D framework to anti-alias the components shown in the VlUM 2.0 display. Beside slowing redrawing to about half the previous speed, this introduced problems with the highlighting of components on ‘mouseover’. When the mouse is placed over a component, the compo- nent is redrawn in white, and redrawn again in its normal colour when the mouse is moved off. The region is not cleared between draws. The anti-aliasing algorithm seemed to draw the component differently in different colours, so the redrawing would leave pixels of the previous colour on the outline of the text. Thus moving the mouse around the display left the display with white pixels.

Transparency could be used to allow, for instance, bright red components to ‘show through’ the components clustered over it, allowing users searching for such components to find them more easily. Again, this effect can be provided by the Java2D package. There are some issues regarding what α, or transparency value to chose so that titles are still readable. However, these were never addressed because initial trials showed that enabling the effect slowed redraw by about an order of magnitude, eliminating the technique on efficiency 86 VlUM 2.0

Figure 5.10: Comparisons between drawing methods. Top Left is the plain drawing. Top Right is is anti-aliased. Bottom left uses alpha-transparency, and bottom right uses both transparency and anti-aliasing. There is an increasing tradeoff between legibility and CPU usage.

grounds. Screen shots of a region of the display with each of these effects enabled are shown in Figure 5.10.

Although these techniques seemed promising, and have been used to good effect in other projects [Sma96], there seemed little to be gained from using them in VlUM 2.0. The display was readable as it was, and this was not improved significantly by anti-aliasing or transparency. In addition, the resource demands were simply too great and the system became unusable with these features enabled. The features were not enabled for any user tests.

5.2 File Format

The RDF schema used in this implementation was slightly different from previous versions. Using plain attributes for the peers was legal RDF, but wrong in spirit. In fact peers are 5.2 File Format 87

0.75 gmp:cohort 0.5 gmp:average gmp:mark 0.86 gmp:mark 0.86 gmp:dataset gmp:reliability

gmp:dataset gmp:reliability

gmp:results gmp:results

http://www.gmp.usyd.edu.au/topics/323.html gmp:results

gmp:results dc:Title gmp:peer

Mechanisms of Pain gmp:dataset gmp:dataset gmp:monthAgoAverage gmp:peer gmp:curriculum

http://www.gmp.usyd.edu.au/topics/342.html

http://www.gmp.usyd.edu.au/topics/445.html

Mechanisms of Pain

Figure 5.11: An RDF entry in VlUM 2.0. Peers are now references to RDF resources. This example is taken from the assessment domain. In the graph I only show all the belief (gmp:results) resources (gmp:reliability and gmp:mark) for two of the resources. Note that gmp:reliability is the certainty, and gmp:mark the score of the belief. Other result resources in the text file but not in the diagram are shown in italics. Similarly, only two peers are shown in the diagram. Other peers in the text representation not in the diagram are shown in italics. 88 VlUM 2.0

Roman Holiday

Figure 5.12: RDF Serialisation of one of the movies from the Internet Movie Database. The use of the gmp: namespace for the movie domain is historical. The namespace should be made domain neutral in future.

references to other resources and should be written as such in the graph. This was done, and the RDF graph became that shown in Figures 5.11 on the page before and 5.12. Since peers were now explicitly stated, the vestigial keyword statements were dropped. Finally, it should be noted that the terms used in the RDF file do not match those used in the text of this thesis. gmp:results are here referred to as beliefs. Similarly, gmp:reliability is a certainty about a belief, while gmp:mark is a belief value.

The final implementation also introduced the concept of certainty of a belief. Moving a belief from a binary relationship to a ternary one necessitated moving each belief to an anonymous resource with attributes for the score and certainty. For generality each belief is now seen by the parent resource as a ‘gmp:results’ statement, pointing to the anonymous resource that has the score and certainty attribute and also an attribute identifying the belief. Finally, the ‘GMP:Name’ attribute was changed to the more widely used ‘dc:Title’ attribute from the Dublin Core metadata standard.

These changes increased the expressive power of the model. It is now possible to attach any number of values to a component, and the relationships between components are clearly structured. The direct peering of components also brings the model into line with emerging practices in the RDF community. This in turn should make it easier to use the model in other RDF consuming systems.

5.2.1 Startup

Upon startup, VlUM 2.0 goes through the following steps.

1. The basic structures are allocated.

2. The configuration file, referenced in the applet tag, is loaded and parsed. This file (an example is shown in Figure 5.13 on the next page) is well formed XML that describes the Display menus to be created. 5.3 A Software Environment for Managing and Monitoring Experiments 89

Your Marks gmp:average Average marks for you Your Year gmp:cohort Average marks for your year gmp:average gmp:cohort gmp:averageVsCohort You v Year How you’re going compared to your year

Figure 5.13: XML file for configuration of VlUM 2.0 . In this case VlUM 2.0 would show three displays: An average, a cohort average, and a comparison between the two. The slider is enabled in this comparison, but could be disabled by setting the adjustable attribute to false in the 13th line.

3. The model file referenced in the applet tag is also loaded and parsed. All XML pars- ing in this implementation of VlUM 2.0 uses a standard package to build a Document Object Model (DOM) tree. This allows the data hierarchy in the file to be accessed with a standard interface. The DOM tree for the model file is then inspected and a more efficient representation of the graph is built. Indexes of node name and internal ID are generated, and peers are given references to each other.

4. The selected component is then set. By default this is the component with the lowest score, on the assumption that a student using VlUM 2.0 for study might find that the most useful component. An applet parameter can be used to adjust this.

5.3 A Software Environment for Managing and Monitor- ing Experiments

I wanted to offer the capability for users to participate in the experiment regardless of location. If VlUM 2.0 was designed to be used within the heterogeneous and location independent Web, then perhaps the evaluation could be as well. To do this, I designed a system consisting of three major components:

• A system for automating the presentation of questions to the participants from within the same web page in which VlUM 2.0 was running. 90 VlUM 2.0

• A reliable method of logging the user interaction during the experiment to a server.

• Features to ease analysis of large numbers of user trials.

Essentially, the environment supports experiments where the user is set a task, the sys- tem logs the time at which the task is presented to the user, and then the task and the log are combined to mine data from the experiment in a semi-automated fashion. The system as a whole is now strongly tied to VlUM 2.0, and can be easily reused in other experiments and web based systems.

5.3.1 Asking Questions

Any experiment that uses this system needs to be based on a file of questions. Since VlUM 2.0 already contains an XML parser, the file is in an XML format, with the major element being the questions themselves. A minimal file might be

Hello World

5

which would present a question with that text in a frame of the web page. A more useful question would contain some of the following attributes:

ID This is the ‘name’ of the question. It is used to refer to the question when analysing results.

type A question may be of a particular type. The default is not to have any special controls for user input, although all questions contain a text area at the bottom for miscella- neous feedback, as can be seen in the top middle frame of Figure 5.14 on the facing page. Other supported types are:

input These questions will show a single line text field below the question text to receive user input, like the question about the user’s age in Figure 5.15 on the next page. The question may also have an attribute cols which sets the length of the text field.

radio These questions will show a set of radio buttons below the question text. The buttons will simply be labelled from 0 to the number shown. There are four shown and the second last has been selected in Figure 5.16 on page 92. The number of buttons to show is given by a further attribute in the question ele- ment, choices. 5.3 A Software Environment for Managing and Monitoring Experiments 91

Figure 5.14: A normal Question. There are four frames in the window. The frame on the left shows the VlUM 2.0 tool. The frame in the middle on the top shows the question frame. The top right frame shows some reminders about how to use VlUM 2.0, and the bottom right frame gives a basic initial tutorial.

Figure 5.15: An age Question 92 VlUM 2.0

Figure 5.16: A radio Question

Figure 5.17: A comment Question 5.3 A Software Environment for Managing and Monitoring Experiments 93

5 What is your age? Answer in the box below, and then use the -->> link below to move to the next page.

10 <script> parent.browser.document.squidgeApplet.resetSquidge(); </script> In the frame on the left is the movie list, with movie titles ordered from oldest movies near the top to most recent at the

15 bottom. As you can see, <em> Directed by </em> has been selected.

20 Find the movie called <em> Children’s Hour, The </em> and select it.

25

Figure 5.18: Examples from the XML Question file used to run user tests on VlUM 2.0.

textarea These questions will show an additional text area below the question text. The question element may contain further attributes, rows and cols, to set the size of the text area, as in Figure 5.17 on the preceding page.

reason This attribute contains a text string, which should contain the reason or reasons for setting the question. For instance, in the movie experiment in Chapter 6 some questions were testing how well ‘easy’ targets were selected, and so the reason attribute might contain reason="easy".

pymark This attribute contained a python expression that, when executed in a defined environment, evaluated whether the question had been answered correctly or not for that trial. The method of marking trials is explained in depth in Section 5.3.3 on page 96.

A more complete example is found in Figure 5.18. In this example, there is a numeric input question asking the participant’s age, a question with no ID (that therefore will not be marked) that also contains a snippet of JavaScript to reset VlUM 2.0 , and a question called “chourSelect” that exists to test ‘easy’. “chourSelect” was answered correctly if “movie==’Children’s Hour, The’ ”. 94 VlUM 2.0

5.3.2 Logging

VlUM 2.0 is instrumented for logging to aid results gathering during the experiments, as is often done in human-computer studies [Bos87, SH91]. The client generated an in-memory log of the session. The log contained lines consisting of a timestamp, the time since the last logged event, and a string that contained the actual event. VlUM 2.0 has a scriptable method that saves the log to the server. This method is called in a number of event handlers attached to the windows and frames of the VlUM 2.0 browser window in an effort to make sure that it is saved at least once no matter how the participant exited the system. If the log is saved twice the server creates two separate files. In order to make it clear that some of the files are largely repeats of others, the client marks all saved logs except the first with a special heading string.

The web server runs a servlet that simply accepts a document via the HTTP POST method and saves it into a particular directory under a filename consisting of the time the request was received. This method was preferred to using HTTP PUT as PUT is not widely supported, and its use in a configuration such as this could be a security prob- lem. Timestaps in both the client and server are simply Long decimals generated by the Java java.util.Date.getTime() method, which returns the number of millisec- onds since 1/1/1970, 00:00:00 GMT.

Earlier implementations of the logging facility used the native Java Remote Method Invocation (RMI) to send events to be logged to a logging server as they happened. The connection to the server was established at the start of the VlUM 2.0 session, and maintained until exit, so there was no need to send possibly redundant logs to the server. Unfortunately, RMI is not passed by some firewalls, making log collection by this method unreliable. The HTTP solution was implemented to work around this restriction.

The log in Figure 5.19 on the next page shows the start of a typical example of the resulting log file. The first line is a comment (the # character marks it as such) that states the log file version. The session starts in line two. This line consists of a timestamp, then milliseconds since the generation of the previous line (60), and a comment. Fields are separated by ‘|||’. Line three shows that VlUM 2.0 had been set to animate component position changes, while lines four and five log the loading of the user model from the URI given. Lines six and seven print some statistics on the model. Lines eight to ten show VlUM 2.0 setting the selected component to the film Directed by William Wyler, along with some more information about that node in the movie graph. Lines eleven and twelve mark the beginning and end of the animation of that selection. VlUM 2.0 was then set to a particular display in line thirteen (although since there is only one in the movie domain this is redundant). VlUM 2.0 then presented the first question in the experiment as in line fourteen. No user action was required in this question, but this user provided some feedback in any case in lines sixteen to eighteen. Some other questions were presented on lines nineteen, twenty two and twenty four, but no action required. In line twenty six VlUM 2.0 was instructed by the question to reset, which is logged. Lines twenty seven to thirty one show the steps of the reset. 5.3 A Software Environment for Managing and Monitoring Experiments 95

#log format v 2.1 945484497420|||60|||start session 945484497470|||50|||doAnimate to true 945484500000|||2530|||model is

5 http://people.gmp.usyd.edu.au/hemul/sq2p/gendata/movies300.rdf 945484506920|||6920|||model size 300 945484506920|||0|||maxPeers 6 min 6 average 6.0 945484507850|||930|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391,

10 peers 33|171|34|121|135|147| 945484507850|||0|||starting animation 945484508290|||440|||animation done - held 450 945484508290|||0|||changed to display 0 - What you would like 945484509720|||1430|||presented Question 0 id ’t-welcome’

15 945484510160|||440|||resize to x=350, y=645 945484603420|||93260|||Question 0 feedback interesting arrangement of titles...looks great... but wouldn’t it be hard to see actuall titles ? 945484603810|||390|||presented Question 1 id ’age’

20 945484615290|||11480|||Question 1 response 23 945484615290|||0|||Question 1 feedback 945484615780|||490|||presented Question 2 id ’’ 945484635390|||19610|||Question 2 feedback 945484635830|||440|||presented Question 3 id ’’

25 945484642970|||7140|||Question 3 feedback 945484643460|||490|||--- resetting --- 945484643460|||0|||changed to display 0 - What you would like 945484644340|||880|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391,

30 peers 33|171|34|121|135|147| 945484644340|||0|||--- reset ---

Figure 5.19: A log from a VlUM 2.0 session. Some lines have been wrapped for display on this page.

The following events are logged by VlUM 2.0

• The URL to the model and number of resources in the model.

• The size of the actual visualisation panel at startup and on any resize.

• Component selection with the name of the component, its score, and its certainty and the indexes of the component’s peers.

• The start and end of any animation.

• Any user feedback from a question.

• Use of any menu option.

• Use of the slider.

• If dragging of components is enabled, mouse positions while dragging. 96 VlUM 2.0

It should be noted that in logging user actions we are allowing ourselves to build a model of the user. In this case we are simply evaluating the users’ ability to use the soft- ware, but other possible uses for this logging are in building more active user models. For instance, the contents of the log could be used to identify user misconceptions and coach them online as was done with the sam text editor [CKRT95, KG93].

5.3.3 Marking Questions

The file format for questions shown in Section 5.3.1 contains an attribute, pymark, for marking questions. It is a python expression that, when executed in a defined environment, indicates whether the question was answered correctly. In the case of question “chourS- elect” of Figure 5.18 on page 93, we check that the selected movie was Children’s Hour, The. The pymark string is run in an environment specified by the author of the analysis scripts. In the case of the VlUM 2.0 experiments, it can contain references to ‘msize’, the size of the data set being tested, ‘cert’, the certainty of the selected component, ‘rec’, the score of the selected component, and specialised functions written for the particular test. Some examples might be

• The selected component must have a certainty of < 0.3 and a recommendation < 0.5.

cert <= 0.3 and rec <= 0.5

so that we can determine whether the user selected a component with certainty below 0.3 on a recommendation less than 0.5.

• The selected component must the peer of the component selected in ‘sbSelect’ with the maximum certainty.

index == maxpeercert(’sbSelect’, logname)

so that we can see whether the selected component is the peer with the most certainty of the component selected in question ‘sbSelect’.

• The selected component must be ‘Roman Holiday’ if the data set size is 100, or ‘Smash-Up, the Story of a Woman’ otherwise.

(msize == 100 and movie == ’Roman Holiday’) or (msize != 100 and movie == ’Smash-Up, the Story of a Woman’)

Notice that < and > signs must be escaped for inclusion in the XML file. The function ‘maxpeercert’ in the second example finds the certainty of the peer of the given component which has the maximum certainty of the peers of the given component. An analysis script for an experiment may define any function we deem necessary. In this case logname passes the name of the log to maxpeercert, so necessary information about the data set size can be retrieved. 5.3 A Software Environment for Managing and Monitoring Experiments 97

5.3.4 Benefits

The framework outlined in this section offers a number of benefits. By containing the questions, their reason, and their method of marking in the same file, experiments are easier to design and manage. The use of XML for the file allows its parsing without adding to the size of the client application (which probably already has an XML parser). Also, the analysis scripts can use the same Java classes for managing the question file as the client application.

Chapter 6

Evaluation

VlUM 2.0 was tested on data from the Internet Move Database. The experiment aimed to evaluate the effectiveness of the core functionality of VlUM 2.0. It tested user’s ability to understand the data displayed, and find answers to core questions we wished to support in the visualisation of large user models. The experiment also tested the sensitivity of the VlUM 2.0 visualisation to data size, and evaluated whether a short on-line tutorial was enough to enable users to do a range of tasks accurately and quickly.

6.1 Aim

The experiment was designed to assess the usability of all major goal functions of the VlUM 2.0 visualisation. It was intended to determine how quickly and accurately a novice

• can navigate around the graph structure. In particular:

Ð can find and select a movie that is a direct peer of the currently selected com- ponent. Ð can find and select a movie that is a ‘cousin’ (separated by two links) of the currently selected movie. Ð can find and select any named movie, usually by using the search feature.

• understands how the ‘recommendation’ (belief value) of the topic is displayed. In particular how accurately and quickly a novice can find and select a movie that is

Ð recommended to a specific degree. Ð most strongly recommended or not recommended.

• understands how the ‘certainty’ of the topic is displayed. In particular how accurately and quickly a novice can find and select a movie about which the model is

Ð certain to a particular degree.

99 100 Evaluation

Ð quite certain.

I also wished to assess how the number of titles in the display affects the speed and accuracy of finding and selecting the desired topic.

6.2 Method

Since I wished to find as many participants as possible, I chose to use the model from the ratings system of the IMDB described in Section 3.2 on page 54, instead of the Online Assessment model. I felt that this domain would be more readily understood by people who were not medical students. To reduce variability, I produced models for a single hypothetical user, named ‘Gina’. Four models for Gina, of 100, 300, 500 and 700 titles, were generated. These sizes were selected to give a good range of data sizes, starting at slightly more titles than I was aiming to minimally support, and ending at somewhat more titles than there are learning topics in the Graduate Medical Program described in Section 3.1 on page 45. I thought it likely that the usability would start to degrade at less than 700 items, given that at this model size there would be more than one title per vertical pixel. That range of 100 - 700 was divided to give four data set sizes, as four data points should give an indication of how the visualisation performs as the number of components increases.

Participants were drawn from computer science undergraduate and postgraduate stu- dents, staff, and general acquaintances. Age ranged from 19 to 53. The sessions were completed at different times within a one week period. Typically, participants were in- structed to load the experiment using a web browser, and then left to complete the session on their own. This gave us the ability to run many sessions, but at the cost of data that may have been gained from close observation of each participant.

The question file shown in Appendix C on page 149 was prepared. These questions started with a brief tutorial on the use of VlUM using a dataset of 300 titles. The tutorial was designed to give the participants an introduction to the basic concepts and operation of VlUM 2.0, and also gave a baseline for the performance of each participant. It covered selecting components, both from obviously visible components and by using the search feature. It then covered the score and certainty measures, and how each are displayed. Finally the tutorial explained how the slider could be used to set the colour change point, and how this could be used to help find titles of a given recommendation. The tutorial was the only introduction the participants were given.

The experiment was trialed on several computer science postgraduate students and some members of the computer science faculty.

There were nine questions to which the answer could be marked. Of these,

• one demonstrated selecting a peer of the current selection

• one showed the participant how recommendation was shown in the display 6.2 Method 101

• one asked the participant the recommendation of the current selection

• one described how certainty was shown

• two demonstrated how a title could be selected by both certainty and recommenda- tion.

• one showed the use of the search function

• one demonstrated a selection of a title that was both a peer of the current selection, and that also had a particular certainty.

• one demonstrated the use of the slider to highlight titles of a particular recommenda- tion. This one also used the search function.

Questions were prepared to test six activities and combinations of some of these. The activities, and the number of questions for each, were

• Navigation

EASY (n=2) Selection of a peer of the current selection, as in Figure 6.1 on the following page. For instance, an EASY question might be ‘Find a movie called Children’s Hour, The and select it.’

COUSIN (n=2) Selection of a peer-peer of the current selection, as in Figure 6.2 on page 103. For example, ‘Find and select Roman Holiday’, where Roman Holiday is a peer of a peer of the current selection.

HARD (n=2) Selection of a topic by name, that is not necessarily closely related to the current selection, as in Figure H.1 on page 221. For example ‘Find and Select Superman II.’ where Superman II is not necessarily easily visible.

• Scores

REC (n=2) Selection of a title based on its recommendation, as in Figure H.2 on page 222. An example would be ‘Find and select a movie that has a recom- mendation of > 80%’.

CERT (n=2) Selection of a title based on the certainty of the recommendation in the model, as in Figure H.3 on page 223. An example might be ‘Find and select a movie about which we are uncertain whether Gina would like or not like it’.

• Combinations of the above

CERTEASY (n=1) Selection of a title that is a peer of the current selection that also has a specified ‘certainty’. An example is shown in Figure H.4 on page 224.

CERTHARD (n=1) Selection of a title that is probably not a peer of the current se- lection that also has a specified ‘certainty’. An example is in Figure H.5 on page 225. 102 Evaluation

Figure 6.1: An EASY task. The title Directed by William Wyler is selected. The question may ask the participant to select a peer of the selected title. The size and spacing of Dead End indicates that it is a peer of the selected title, so it is selected to complete the task. Alternatively, a participant may be asked to find and select Dead End, which should be easy because it is a peer.

HARDREC (n=1) Selection of the title that contains a given substring and also is of a specified recommendation. An example is in Figure H.6 on page 225.

RECSLIDER (N=2) Using the slider to help find titles of a given recommendation. An example is in Figure H.7 on page 226

EASYREC (n=1) Selection of a title that is a peer of the current selection that also has a specified recommendation. An example is in Figure H.8 on page 226

CERTREC (n=1) Selection of a title based on both recommendation and certainty. An example is in Figure H.9 on page 227.

Due to the already lengthy nature of the test, some marginal combinations were not tested. In particular I did not test the combinations of CERT and REC tasks with the COUSIN task, the results of which would probably be similar to the combination with the EASY task.

Once the user has completed the tutorials, the questions file then instructed VlUM to 6.2 Method 103

Figure 6.2: A COUSIN task. The title Dead End is selected. The question may ask the participant to select Children’s Hour, The, which is a peer of a peer of Dead End, and hence a little harder to see. Children’s Hour, The is selected to complete the task. load one of the four data sets. Which data set, and therefore the size of the data set, was chosen at random. The questions file ensured that all data sets started with the same title (Directed by William Wyler) selected.

Some 57 logs were collected over the course of the experiment. Most participants were randomly selected from people near some department computing laboratories on a single working day. They represented all years of a Computer Science course and Department office staff. Some other participants were medical students, and friends and family of the investigator. No register of participants was kept, although participants did enter their age, which was logged with the trial.

The participants were given identical data sets for the tutorial (the 300 item IMDB set), and then a randomly selected set for the main test. The sets were all based on the IMDB data, but differed in size to see how the size of the data set affected performance.

There were 57 participants overall. Of these, 10 received the 100 item set, 14 received the 300, 14 the 500, and 18 the 700 item set. For the first few participants, the algorithm 104 Evaluation

for choosing the random data set was accidentally skewed to choose larger sets, hence the difference in the number of participants completing the tasks on 100 and 700 item sets. This problem was corrected in time to capture a good number of trials on 100 items.

The appearance of VlUM 2.0 at the start of the test is shown in Figure 6.3 on the next page

6.3 Analysis

The logs from the experiment sessions were analysed by a script written in JYTHON that consumed the individual logs, the questions file, and the user models, and produced text files suitable for plotting with GRAP, and some sundry statistics and logging information.

Each log was first analysed to find the data set size used and average animation times. They were then parsed line by line to find which questions were answered, and how. The ‘pymark’ string for that question could then be evaluated to actually mark the question for that log. All this information is then stored in various internal structures.

A file containing all wrong answers was then created. The records in this file referenced the particular log, the question, the answer given, and what should have been given. This was used to help check that these answers were actually wrong. In some cases a manual review of the log showed that the answer was in fact correct, but had been done in a way that confused the analysis script. For instance, in some cases it was possible for the log to show that a topic had been selected before it logged the start of the question. In these cases the log file was modified to clarify the users’ actions. The data was then reanalysed to take account of these changes.

I was interested in three measures for each task:

Time to answer the question. This was measured from the time the question was pre- sented, to the time the last answer was chosen for that question. This gave an indi- cation of how long it took a participant to read, understand, and complete the task. One of the goals of the interface is to enable users to complete these tasks quickly.

Correctness of the answer. It is no good for a user to get an answer quickly if it is not acceptably correct.

Steps to answer the question. The number of steps indicates how long a participant had to ‘fiddle’ to find the answer. The fewer steps, the more efficiently the participant answered the question.

The data gleaned from the logs was therefore used to generate:

Task graphs The GRAP source for the graphs you see in this chapter and Appendix D. These graphs show the average time taken, mark, and steps taken for each task and each data set size. Extremely outlying data points were removed from these graphs. 6.3 Analysis 105

Figure 6.3: The VlUM 2.0 applet as it appeared to participants at the start of the test, after the tutorial. At the top left is the 100 item set, top right is the 300 item set, bottom left the 500 item, and bottom right the 700. 106 Evaluation

In the graphs of time taken, points that were more than ten times the average were removed as they were only the result of participant distraction. These graphs also include lines indication the first and third quartile range of the data. In the steps taken graphs, points more than 5 times the average were removed, as in these cases the participant was probably playing rather than trying to answer the question.

Log summary A table showing the data set size, participant age, and average mark, time and steps for each log.

Task summary A table showing the average mark, time and steps for each task for each data set size. In these summaries, outliers were removed as in the graphs above, except that points in the time to complete summary were considered outliers if they were more than two times the average. The graphs show all points for completeness, but the summary tries to give useful results. Inspection of the graphs shows that nearly all points are within two times the average.

Learning summary Tables showing, where a task was set more than once in the test or tutorial, whether the participants improved or not over time.

The summaries were then used to chart other information, such as the scatter plots of time taken vs age and average mark vs age in section 6.4.5. The data for each task is also presented in the following section.

6.4 Results

As explained previously, the results have been analysed for time to answer, percentage correct, and steps taken. This section first describes some general results for each analysis, explains how the data are presented, and then shows the results for each task.

6.4.1 Time to Answer

The first analysis of time to answer is shown in Figure 6.4 on the facing page. This chart shows how many participants completed the whole tutorial plus the experiment within a given number of minutes. The time to complete only takes into account the time to answer each question in the test, so some time after answering each question has been discounted. Completions are further broken into the data set size that was completed. For instance, the first participant to complete finished in four minutes, on the 300 item set. The next completed in 5 minutes, using the 100 item set. Four participants finished in 10 minutes, on 100 and 300 item data sets. The last participant to complete took 44 minutes on an 700 item set.

As can be seen, the bulk of participants (74%) completed the experiment within 20 minutes, including all the 100 item participants, while a few took up to 45 minutes. All sessions that took more than 26 minutes were on data sets of more than 100 items. 6.4 Results 107

20 18 16 14 700 12 500 300 10 100 8 6 4 2 0 Number of completions at the time

0 5 10 15 20 25 30 35 40 45 Time taken to complete all tasks

Figure 6.4: Plot of times taken to complete all the tasks in the tutorial and the experiment. The colour key is used to indicate how large a data set was completed at each time. For instance, in the 6th minute, the tasks on one 300 and one 500 item set were completed.

20 18 16 14 700 12 500 300 10 100 8 6 4 2 0 Number of completions at the time

0 5 10 15 20 25 30 35 40 45 Time taken to complete Tutorial

Figure 6.5: Plot of times taken to complete the tutorial in the experiment. These tasks are all measured on a data set of 300 items. The colour coding only indicates what size data set was used for the other questions in each trial. 108 Evaluation

20 18 16 14 700 12 500 300 10 100 8 6 4 2 0 Number of completions at the time

0 5 10 15 20 25 30 35 40 45 Time taken to complete Main Tasks

Figure 6.6: Plot of times taken to complete the main (non-tutorial) tasks in the experiment.

Figures 6.5 and 6.6 show this measure broken down into time to complete the tutorial, and time to complete the test tasks. The tutorial was always shown using the 300 item set, so the colour coding in Figure 6.5 simply shows what size data set the participant used in the following test.

The figures for time in Appendix D from Page 159 show the raw data for the time participants took to answer each type of question, from the time the question was presented to the time they clicked on their final answer, and so includes reading and comprehension of the question. The first plot has been included as Figure 6.7 on the facing page. The data is plotted with data set size on the x axis and time in seconds on the y. Data was filtered to remove points that were more than ten times the set average. Plots were then made showing all correct answers as bullets (•), incorrect answers as delta ( ), the average of the correct points as a dotted line, and one standard deviation from that average as horizontal marks. Further vertical lines at each data set size show the range of the first and third quartile of the set of correct answers. The number of correct and incorrect answers for each data set size is indicated towards the top of the graph. In Figure 6.7 on the next page there were 20 correct answers plotted for the 100 item data set, no incorrect answers, and no outliers.

Most graphs show that time to answer incorrectly is very quick. A closer look at the actual answers shows the usual reason for this was that the question was not answered at all, the participant having either gone onto the next question without attempting to answer, or incorrectly thinking the currently selected title was the correct answer. 6.4 Results 109

200 easy ¥ 20 ¥ 24 ¥ 24 ¥ 33 ∆0 ∆4 ∆3 ∆5 150 ol 0 ol 0 ol 1 ol 0

100

50

Time to answer (sec) Time ¥¥ ¥¥ ¥∆ ¥ ¥¥ ¥¥ ¥¥∆ ¥¥ ¥¥∆ 0 ¥¥ ¥¥∆ ¥¥∆ ¥¥∆

100 300 500 700 Data set size

Figure 6.7: Time to answer task EASY. Bullets (•) indicate correct answers, while deltas ( ) indicate incorrect answers.

6.4.2 Percentage of Correct Answers

The figures for percent correct in Appendix D show how often participants correctly an- swered each type of task. The first plot has also been included as Figure 6.8 on the fol- lowing page. The data is plotted with data set size on the x axis and percent correct on the y. Data was not filtered to remove outliers. Plots were then made showing the percent of correct answers as a dotted line. The number of points used in this average is indicated at the bottom of the graph for each data set size. In Figure 6.8 on the next page there were 20 answers for the 100 item data set, with all users answering correctly. The 300 item set had an average mark of slightly less than 90% from 28 trials.

6.4.3 Steps Taken to Answer

This is a measure of the amount of ‘hunting’ the participant did to find the answer. A ‘step’ is a click, a search, using the slider (one ‘step’ if the slider is used while answering the question) or changing view, which was not a factor in this experiment.

The figures for steps taken in Appendix D show how many steps were required by the participants for each type of task. The first plot has also been included as Figure 6.9 on the following page. The data is plotted with data set size on the x axis and number of steps taken on the y. Data was filtered to remove data that was more than five times the average. Plots were then made showing the number of steps taken to answer. Correct answers are shown as bullets (•), and incorrect as a delta ( ). The average of the correct answers is 110 Evaluation

easy 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 38 0

100 300 500 700 Data set size

Figure 6.8: Percentage of participants giving correct answers for task EASY.

easy 10 ¥ 20 ¥ 24 ¥ 25 ¥ 33 ∆0 ∆4 ∆3 ∆5 8 ol 0 ol 0 ol 0 ol 0

6

4 ∆ ¥ ¥ ¥ ∆ ¥¥ Steps to answer 2 ¥¥ ¥¥ ¥¥∆ ¥¥∆ ¥¥ ¥¥∆ ¥¥∆ ¥¥¥∆∆ 0

100 300 500 700 Data set size

Figure 6.9: Steps taken to correctly complete task EASY. Bullets (•) indicate correct an- swers, while deltas ( ) indicate incorrect answers. 6.4 Results 111 plotted as a dotted line, and one standard deviation from that average as horizontal marks. The number of correct and incorrect answers in each data set used in the plot are indicated toward the top of the graph. In Figure 6.9 on the preceding page there were 20 correct and no incorrect answers for the 100 item data set. It took, on average, 1.5 steps to answer the task correctly, with a small deviation of about 0.2 steps.

6.4.4 Results by Task

The results for each task are given and discussed below. For each task I have provided a table showing, for each data set size and the tutorial,

• The average time to answer that task correctly.

• The average mark scored in the task. Since a participant either got the task right or wrong, this is an average of the number of correct answers over the number of participants.

• The average number of steps required to answer the task correctly.

Where there was a corresponding task in the tutorial, I give the figures for that task in the tutorial as well, as the first row in the table with the ‘Size’ of ‘300T’.

Task EASY

Figures D.13, D.14 and D.15 on Pages 165- Size time avg % steps n 166. 300T 8.5 81 1.0 57 100 6.5 100 1.4 20 Task EASY involved selecting a title that was known to be well exposed, and therefore quite 300 7.5 86 1.3 28 visible. It was thought that this would require a 500 6.9 89 1.4 28 simple visual scan of the list for the item. The 700 12.4 87 1.7 38 task was answered well and quickly, with minimal mis-steps, even in the tutorial. There was a slight degradation to below 90% accuracy on larger data set sizes (not including the tutorial), indicating a possible limit to the number of items that can be displayed. It also took longer to answer and more steps at 700 items, again showing that answering at these set sizes might be becoming difficult. 112 Evaluation

200 easytutorial easy ¥ 46 ¥ 24 ∆10 ∆4 150 ol 1 ol 0

100 ¥

50 ¥ Time to answer (sec) Time ¥ ¥¥¥¥ ¥ ∆ ¥¥¥¥¥¥ ¥¥¥¥ ∆ 0 ¥¥¥¥¥¥¥¥ ∆∆∆∆∆∆∆ ¥¥¥¥¥¥ ∆∆

Figure 6.10: Comparison between the time to answer task EASY in the tutorial, and in the experiment proper (300 item data set).

Figure 6.10 shows that participants did answer task EASY T E1 E2 marginally faster after the tutorial, and with somewhat better ac- • • • 40 curacy. The table to the right shows how participants answered • • 8 the EASY questions in the tutorial and main test. The T, E1 and • • 3 E2 columns show a bullet (•) for a correct answer in either the • 2 tutorial question (T), or one of the main test questions (E1 or E2). • 2 The last column ( ) shows the number of participants who an- • • 1 swered with that pattern. It shows that 40 participants answered 1 all questions correctly, while 2 answered the tutorial question correctly but failed both the main test questions. It indicates that most participants (48/57) did answer EASY questions correctly after the tutorial, which suggests that the tutorial was effective for this task.

Task COUSIN

Figures D.16, D.17 and D.18 on Pages 166- Size time avg % steps n 167. 100 9.5 85 1.7 20 This task COUSIN involved selecting an item 300 10.7 79 1.9 28 that was twice removed from the currently se- 500 12.1 79 1.8 28 lected item. It required the participant to find a 700 11.0 87 1.6 38 ‘peer of a peer’ of the currently selected item, usually by name. For instance, if Roman Holiday was a peer-peer of the current selection then a COUSIN task might be to, Find and select Roman Holiday. There was no special tutorial on this task. The task was answered well by participants, with time to answer, correctness and steps all not much more than 6.4 Results 113

200 hardtutorial hard ¥ 48 ¥ 19 ∆9 ∆9 150 ol 0 ¥ ol 0

¥ 100 ¥ ¥ ¥ ¥¥ ¥¥¥ ¥¥¥¥¥ 50 ¥¥¥ ¥¥¥¥¥ Time to answer (sec) Time ¥¥¥ ¥¥¥¥ ¥¥¥¥¥ ¥¥¥¥ ¥¥¥¥¥¥¥¥ 0 ∆∆∆∆∆∆ ∆∆∆∆

Figure 6.11: Comparison between the time to answer task HARD in the tutorial, and in the experiment proper (300 item data set). those of the EASY task. Like EASY, the overall difficulty of answering does not rise much with data set size.

Task HARD

Figures D.19, D.20 and D.21 on Pages 168- Size time avg % steps n 169. 300T 49.0 84 2.5 57 In this task the participant must find and se- 100 12.8 95 2.3 28 lect a title that may have no relation to the cur- 300 14.5 68 2.1 28 rently selected title and thus may be obscured. 500 13.5 71 2.2 28 The expected method of completing the task is 700 17.5 92 2.1 38 to use the search facility. Again, this was answered quite well, taking slightly more time and steps than COUSIN. A close look at the raw data shows that some wrong answers were due to the participant using the search facility to display the correct title, but then not se- lecting it before moving to the next question. In general the task took less than 25 seconds to answer, and was answered correctly about 80% of the time.

A strange effect can be seen in this task, where percentage correct was somewhat less for 300 and 500 items than for 100 and 700, except in the tutorial. This does not seem to be a way to explain this from the data gathered. It is possible, given the number of trials, that the participants were not of a similar average capability for each data set. Perhaps more careful screening and registration of participants would help decide whether this is true. 114 Evaluation

It seems from Figure 6.11 on the page before that the tutorial T E1 E2 did help participants to answer HARD tasks more quickly, The • • • 35 table to the right shows how often the HARD questions were an- • • 7 swered in relation to each other. For instance, it shows that 7 • • 6 participants got the tutorial example right, but then failed on the • • 3 first test question, and then succeeded on the second test ques- • 3 ( / ) tion. It indicates that most participants 48 57 had discovered 2 how to answer task HARD by the last attempt. • 1

Task RECSLIDER

Figures D.22, D.23 and D.24 on Pages 169- Size time avg % steps n 170. 100 12.0 55 2.3 20 This task was to test how well the slider could 300 26.2 43 2.4 28 be used to find titles with minimal or maximal 500 54.2 39 4.3 28 recommendation. An example of the type of ques- 700 34.3 19 3.6 36 tion is Find and select the movie Gina will like most. The slider may help here. There was no directly analogous task in the tutorial, although the use of the slider was explained.

At low data set sizes finding the correct answer took less than twenty seconds. At 500 items this task takes up to 55 seconds to answer correctly, although that drops back to about 35 seconds at 700 items.

If three outlying results are taken into account, the average time to answer at 700 items is about 180 seconds. These results were assumed to be due to the participant being dis- tracted, and so were removed. Time taken to answer incorrectly remains fairly static at around 30 seconds.

Most of the time was probably spent ‘hunting’ around for the answer within cluttered areas of the display. The average number of steps required to answer correctly seems to confirm this with 2.5Ð4.5 steps required to answer the task. As may be expected for a task that was so involved to answer, it was not answered correctly as often as other tasks. On the 100 item set, the task was only answered correctly 55% of the time. By 700, it was less than 20% of the time.

It seems that the hunting method of finding minima and maxima is not very effective when the number of items rises past a certain point.

Task CERT 6.4 Results 115

200 certtutorial cert ¥ 56 ¥ 24 ∆1 ∆3 150 ol 0 ol 1

100

50 ¥ ¥ ¥ Time to answer (sec) Time ¥¥¥¥ ¥ ¥ ∆ ¥ ¥¥¥¥¥ ¥¥ ¥¥¥¥¥¥ ¥¥¥¥¥ 0 ¥¥¥¥¥¥¥ ¥¥¥¥¥ ∆∆∆

Figure 6.12: Comparison between the time to answer task CERT in the tutorial, and in the experiment proper (300 item data set).

Figures D.25, D.26 and D.27 on Pages 171- Size time avg % steps n 172. 300T 10.0 98 1.2 57 This task involved selecting a title based on 100 10.6 80 1.2 20 its certainty. For instance, the participant may 300 10.7 89 1.2 28 be asked to find a title with a certainty > 80%. 500 14.0 86 1.2 28 It was supposed that a participant would answer 700 14.1 68 1.4 37 by scanning the list and picking out titles that started at a certain horizontal position.

It was answered within 15 seconds on average, and with 80% or better accuracy except on 700 items. It rarely took more than one click, indicating that my presumption about how a participant would answer is probably correct. The time to answer did not depend strongly on data set size, with only a small jump in time to answer between 300 and 500 items.

It would seem that horizontal position is an easily discernible feature.

The table to the right shows how often the CERT questions T E1 E2 were answered in relation to each other. Almost all participants • • • 38 got the tutorial question right. This is possibly because the ques- • • 7 tion, Find and select a movie that we aren’t at all certain whether • • 6 Gina would like or not. (The title will be yellow), was quite easy. • 4 Some participants (13/57) failed to correctly answer a task in the 1 test after having correctly answered the tutorial question. Again, it may be that the tutorial question was simply easier than these other questions. The tuto- rial question was ‘Find and select a movie that we aren’t at all certain whether Gina would like or not. (The title will be yellow)’, while the test questions were ‘Select any movie that we are uncertain whether Gina would like or not.’ and ‘Find and select the movie we are 116 Evaluation

most certain about, no matter whether we think it’s good or bad.’. Figure 6.12 on the pre- ceding page seems to suggest that the tutorial question was quite easy, with most answers for the tutorial being in the same range as the main questions, which is unusual within this experiment.

Task REC

Figures D.28, D.29 and D.30 on Pages 172- Size time avg % steps n 173. 300T 42.4 65 2.1 57

This task involved selecting a title based on 100 21.0 80 2.4 20 its recommendation. For instance the partici- 300 21.2 86 2.3 28 pant may be asked to find a title with a recom- 500 31.2 93 3.0 28 mendation > 80%. This is best tackled by look- 700 42.7 53 4.8 36 ing for titles of a shade that indicates the mark desired. Since the desired title might be obscured, the participant may have to select some random titles to unclutter regions of the display, a technique not well described in the tutorial. The exact recommendation of the title under the cursor was displayed at the bottom of the screen, so a participant could find a title of about the right colour and then check their guess by moving the cursor over that title.

The task was answered moderately well for smaller data sets. At 100 items it took 21 seconds to answer correctly, which was done about 80% of the time. Although it was still answered correctly more than 80% of the time up to 500 items, it took longer as the size increased. In particular, the number of steps taken by some participants to find a correct answer increased dramatically with data set size, as shown in Figure D.30 on page 173. It seems that hunting for the correct answer in an increasingly cluttered display took its toll. At 500 items it was taking around 30 seconds to answer correctly. At 700 items the percentage of correct answers dropped sharply to 53%, again suggesting a limit to the amount of data that can be displayed.

Note that the results are in all respects worse than in task CERT. This suggests that colour in a cluttered region does not work as well as horizontal alignment as an indicator in the display. 6.4 Results 117

200 rectutorial rec ¥ 35 ¥ 24 ¥ ∆19 ∆4 150 ol 3 ¥ ol 0 ¥ ¥ 100 ¥ ¥¥ ¥ ∆ ¥ ∆∆ ¥ ¥¥¥ ¥¥¥¥ 50 ¥¥ ∆ ¥ ¥¥ ∆∆ ¥ Time to answer (sec) Time ∆ ¥ ∆∆∆ ¥¥ ∆∆ ¥¥¥¥ ¥¥¥¥¥ ∆∆ ¥¥¥ ¥¥¥ ∆∆∆ ¥¥¥¥ ∆ 0 ¥¥¥¥¥ ∆∆ ¥¥ ∆∆

Figure 6.13: Comparison between the time to answer task REC in the tutorial, and in the experiment proper (300 item data set).

The table to the right shows how often the REC ques- T1 T2 E1 E2 tions were answered in relation to each other. One trial • • • • 25 was removed from the table because not all relevant ques- • • 6 tions were answered. Patterns with no responses ( was • • • 5 0) have been removed from the table as well. It shows that • • • 4 the tutorial may have helped to some extent. Figure 6.13 3 supports this, showing that the time to answer the task • • 3 dropped between the tutorial and the main test. In this • • • 3 task, there were two tutorial questions, one explaining the • 2 ‘recommendation’ measure and asking the participant to • 1 select a title thus (T1 in the table), and another asking the • • 1 participant to type in the recommendation value of a par- • • 1 ticular title (T2 in the table). It is unclear from the table • 1 whether this teaching strategy was effective. • • • 1

Task CERTEASY

Figures D.31, D.32 and D.33 on Pages 174- Size time avg % steps n 175. 300T 34.3 33 1.1 57 This task involved the participant selecting 100 36.4 30 1.0 10 a title based on both its certainty (horizontal 300 25.1 7 1.0 14 position) and it being a direct peer of the cur- 500 39.3 29 1.0 14 rently selected topic. The time between the 700 102.3 17 1.3 18 question being presented, and a correct answer being given, was below 40 seconds for 118 Evaluation

200 certeasytutorial certeasy ¥ 19 ¥ 1 ∆38 ∆12 ∆ 150 ol 0 ol 1 ∆ ∆ 100 ∆ ¥ ∆∆ ¥ ∆ ∆∆ ¥ ∆ ∆ ¥¥¥ ∆∆ 50 ∆∆∆ ∆

Time to answer (sec) Time ¥ ∆ ¥¥ ∆∆ ∆ ¥¥ ∆∆∆ ¥ ¥¥¥ ∆∆∆ ∆∆ ¥¥ ∆∆ ∆∆ 0 ∆∆∆∆ ∆

Figure 6.14: Comparison between the time to answer task CERTEASY in the tutorial, and in the experiment proper (300 item data set).

data set sizes up to 500, and about 100 seconds for 700 items. The number of trials for these figures are small (3 trials for 700 items, and only 1 for 300 items.) because there were few correct answers. In fact, the task was answered correctly only 10-30% of the time, regardless of data set size. A correct answer was usually found in one step, except for the 700 item set, which took slightly more on average. Incorrect answers usually took more steps, between 1.25 and 3. This makes sense, because if a participant takes more than one step they lose the context for finding the answer to the EASY part of the question.

A close look at the logs shows that many incorrect answers (half the answers in the 100 item set) were the most certain title, rather than the most certain title that is also a direct peer of the current selection. It seems that many participants did not interpret the question as I intended. Perhaps the dual requirement posed too much load.

A lack of correct answers gave a small sample to study. That sample suggests that it takes participants 40 seconds or less to answer at smaller set sizes, but about 100 seconds to answer for 700 items.

The table to the right shows how often the CERTEASY questions T E1

were answered in relation to each other. As can be seen from the 33 ( / ) column, most participants 33 56 did not ever get a correct answer. • 12 One participant was excluded from this measure as they did not at- • • 7 tempt one of the questions. Of the trials that got something right, most • 4 successfully completed the tutorial question, but failed to answer the test question correctly. The tutorial question was Find and select the movie closely related to Roman Holiday that we have the most information on (i.e. we are most certain about our recommendation). (the title Roman Holiday had been selected in the previous question). 6.4 Results 119

Task CERTHARD

Figures D.34, D.35 and D.36 on Pages 175- Size time avg % steps n 176. 100 25.0 90 2.2 10 This task involved selecting a title based on 300 28.1 71 2.1 14 both its certainty and its title, where the title was 500 33.1 93 2.0 14 expressed as a substring. For instance, Find and 700 52.1 74 2.2 19 select the movie with the most certain recommendation which contains the letters ‘it’. The task took between 25 and 33 seconds to answer correctly for data sets up to 500 items, and about 52 seconds for 700 items. It was answered correctly 90% of the time for 100 and 500 items, and 70Ð75% of the time for 300 and 700. Correct answers were found in slightly more than 2 steps on average.

This task was answered quite well. In fact, the results for this task were substantially better than for CERTEASY. It is likely that the search stage of the task quickly and easily selects only a small number of titles that must then be scanned for certainty. The CERTEASY task asks a user to first visually scan and judge which are the peers of the current selection. The results for tasks EASY and HARD suggest that EASY is performed better. However, it seems from these results that it is easier for users to combine the HARD task with this task.

Task HARDREC

Figures D.37, D.38 and D.39 on Pages 177- Size time avg % steps n 178. 100 23.5 90 2.3 10 This task involves selecting a title based on a 300 35.2 36 2.4 14 substring. For example, Find and select the most 500 59.8 36 2.4 14 recommended movie with the letters ‘man’ in the 700 23.2 11 2.0 18 title.

As explained later, the results for this task can be analysed in two ways. However, the time to answer for both methods is similar. Time to answer goes from around 25 seconds for 100 items, peaks at 50 seconds for 500 items, and falls to about 30 seconds for 700 items.

This task was similar to CERTHARD, but with the certainty condition replace with rec- ommendation. It took between 23 and 35 seconds to answer correctly for set sizes 100, 300 and 700, and 60 seconds for the 500 item set. It is unclear why the 500 item set took longer than the others. The task was answered well for the 100 item set, with about 90% correct. However, the percent correct quickly fell to less than 40% for 300Ð500 items, and almost 10% for 700 items. Steps required to complete the task correctly averaged 2.4Ð2.5 for up to 500 items, and then 2 for 700. Only two trials of 700 items were answered correctly, both in the minimum two steps.

The low percentage of correct answers seems dire. However, most of the incorrect answers were very close to correct. For instance, in the 700 item set, out of 18 trials, 120 Evaluation

only two were answered correctly, but 9 were answered with a ‘common close’ answer, and three exhibited a common bug of not selecting any answer after a search (and so may or may not have been wrong). Only four trials were completely incorrect. The ‘common close’ answer arose because all data sets except for the 100 item set had two answers whose recommendation differed by only 1%. While the correct answer for these sets was Smash- Up, the Story of a Woman (81% recommendation), many answered with Roman Holiday (80% recommendation). It should be noted that Roman Holiday is a well known movie that had been used previously in the list of tasks, perhaps ‘fooling’ the participants into preferring it over the marginally better answer. If we take Roman Holiday as a correct answer in all cases the average score for 700 items rises to 56%, as shown in Figure D.40 on page 178.

Task EASYREC

Figures D.41, D.42 and D.43 on Pages 179- Size time avg % steps n 180. 100 24.4 50 1.2 10 This task involved selecting a title based on 300 15.2 64 1.1 14 both its recommendation, and that is also a peer 500 34.0 57 2.0 14 of the currently selected title. Find and select a 700 20.7 26 1.0 19 movie that we think Gina would not like (recommendation < 40%) that is similar to Carrie, where Carrie had been selected in the previous task.

For 100 and 300 items, this was answered in between 15Ð25 seconds. It took less than 35 seconds for 500 items and about 20 seconds for 700 items. The correct answers were found in about one step, except for the 500 item set which took an average of 2 steps, mainly because of one participant who took 8 steps to answer correctly, skewing the average. However, the task was not often answered correctly: 50% for 100 items, about 60% for 300 and 500, and less than 30% for 700 items.

A review of the raw data shows that many incorrect an- Size time avg % swers were a ‘common close’ answer that was a cousin rather 300 13.6 71 than a peer of the previously selected item. This answer was 500 30.5 64 given seven times in the 700 item set, which is slightly more 700 17.1 32 often than correct answers (six). In the other set sizes this mis- take was less common. The 100 item set only shows one trial with this answer, out of ten trials. It could be argued that a ‘cousin’ is still similar to the selected title, and thus these answers were correct. If this line is taken, the scores change for the 300, 500 and 700 sets to those shown in the table to the right of this paragraph. As you can see, both time to answer and average score improve.

Again, the results suggest that comparing recommendations becomes increasingly dif- ficult as the set size rises above 300 items. It also seems it was particularly hard for par- ticipants to combine the two aspects of the task. Percentage of correct answers is lower than for either of the simple tasks (REC or EASY) that combine to make this one. Also, 6.4 Results 121

200 certrectutorial certrec ¥ 65 ¥ 13 ∆48 ∆ ∆1 150 ol 1 ∆ ol 0 ∆

¥ 100 ¥ ∆∆ ¥ ∆∆∆ ¥¥ ∆ ∆∆∆ ∆ ¥¥ 50 ¥¥ ∆ ¥¥ ¥¥¥ ∆∆∆ ¥ Time to answer (sec) Time ¥ ∆∆ ¥ ¥¥¥ ∆∆ ¥ ¥¥¥¥¥ ∆∆∆ ¥¥¥¥¥ ∆∆∆ ¥¥ ∆ ¥¥¥¥¥¥¥ ∆∆ 0 ¥¥¥¥¥¥ ∆∆∆∆∆∆ ¥

Figure 6.15: Comparison between the time to answer task CERTREC in the tutorial, and in the experiment proper (300 item data set). the difference between peer and cousin is not always totally clear and ‘similar’ probably includes both.

Task CERTREC

Figures D.44, D.45 and D.46 on Pages 180- Size time avg % steps n 181. 300T 17.9 57 1.1 114 Participants were asked to select a title based 100 24.4 80 1.0 10 on both certainty and recommendation. An 300 33.8 93 1.2 14 example question is Select any movie about 500 27.0 93 1.7 14 which we are very uncertain (certainty < 30%) 700 46.3 63 1.4 19 but think Gina would not like (recommendation < 50%).

This particular question was quite difficult, as it had two parts, both expressed as neg- atives. However, the task was answered correctly about 90% of the time for 300 and 500 items. For 100 items it was answered correctly 80% of the time, and 63% of the time at 700 items. Answering correctly took less than 35 seconds for data sets up to 500 items. The 700 item set took 46 seconds to answer correctly on average. The 100 item set took a single click to answer. ‘Hunting’ became more prevalent as the set sizes grew, with an average of 1.7 steps taken to answer correctly at 500 items.

The task was answered well and within a reasonable time for sets up to 500 items in size. In fact, it was answered more quickly than the REC task alone. This could be because the CERT task reduces the number of titles to be searched by recommendation. 122 Evaluation

The table to the right shows how often the CERTREC ques- T1 T2 E tions were answered in relation to each other. Note that there • • 21 were two tutorial questions and only one test question. As can • • • 17 ( / ) be seen from the column, most participants 46 57 answered • 7 correctly in the test question. A large proportion of participants • 7 did not get the task right at first in the tutorial, but managed on 3 the second attempt. This is all supported by Figure 6.15 on the • • 1 preceding page, which shows a much better average mark (on a • • 1 smaller data set), and a comparable time to answer.

ALL tasks

Figures 6.16, D.47 and 6.17 on Pages 123- Size time avg % steps n 123. 100 14.6 66 1.8 200

On single tasks (CERT, REC, EASY, HARD, 300 16.9 59 1.7 280 COUSIN) the participants performed well on data 500 22.4 61 2.0 280 sets up to 500 items. Percentage correct and 700 20.4 51 1.9 372 time to answer correctly got worse in the REC task after 500 items as participants were forced to ‘hunt’ through clustered areas. The ‘recslider’ task also reflected problems with the cluttering of the display on large data sizes.

REC tasks were answered more slowly and less accurately than CERT tasks, perhaps because human location perception is more sensitive than colour differentiation in cluttered areas.

Overall there is a rather straight tendency towards lower, slower scores as the data set size rises.

6.4.5 Effect of Participant Age

Although no record was kept of participants, their reported age was logged (in all but one case) with the experiment log. This was used to plot a number of charts showing the effect of age, if any, on the participants ability to complete the tasks.

Figure 6.18 on page 124 shows all trials plotted with age of the participant on the x axis and time to answer both tutorial and experiment in minutes on the y axis. Different symbols are used to indicate the size of the data set for that trial, and a legend is given to show which symbol shows what size. For example, a dot (•) indicates the participant was given the 100 item set, a star (∗) indicates the 300 item set, a delta ( ) the 500 set and a cross (+) the 700 set.

The graph shows that the bulk of the participants were below 30 years of age. Also, the random allocation of data set sizes had given the 500 item set to all three participants over 50 years old, and only one 100 item set to anyone over 30 years old. 6.4 Results 123

200 ∆ all ¥ ¥∆ ∆ ¥ 133 ¥ ¥ 163 ∆ ¥ 167 ¥ ¥ 183 ∆66 ∆114 ∆ ∆107 ∆ ∆180 150 ol 1 ol 3 ol 6 ∆ ol 9 ∆ ∆ ∆ ¥ ¥∆ ¥ ¥ ∆ ¥ ¥ ∆ ¥¥∆ 100 ∆∆ ¥ ∆ ¥∆ ¥ ¥∆ ¥∆ ∆ ¥∆ ∆ ∆ ¥¥∆ ¥¥ ¥∆ ∆ ¥∆ ¥¥∆ ¥∆ ¥ ¥∆ ¥¥∆ ¥∆ ¥∆ ¥∆ ¥¥∆ ¥∆∆ 50 ¥¥∆∆ ¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ ¥∆∆ ¥¥∆ ¥¥∆∆ Time to answer (sec) Time ¥¥ ¥¥∆∆ ¥∆ ¥¥∆ ¥¥∆ ¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥¥∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ 0 ¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ ¥¥∆∆

100 300 500 700 Data set size

Figure 6.16: Average time to answer.

all 10 ¥ ¥ ¥ ¥ 132 ¥ ¥ 163 ¥ ¥ 170 ¥ ¥ 183 ∆65 ∆116 ∆107 ∆182 8 ¥ ol 3 ol 1 ¥ ol 3 ¥∆ ol 7 ¥ ∆ ∆∆ 6 ∆ ¥ ¥¥∆ ¥ ¥ ¥ ¥∆ 4 ¥¥∆ ¥¥∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆ ¥¥∆∆ ¥¥∆ ¥¥∆∆ Steps to answer 2 ¥¥¥∆∆ ¥¥∆∆ ¥¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ ¥¥∆∆ ¥¥¥∆∆ ¥¥¥¥¥∆∆ 0 ¥∆ ¥¥∆∆ ¥¥∆∆ ¥¥∆∆

100 300 500 700 Data set size

Figure 6.17: Average number of steps to complete a task. 124 Evaluation

+ + 60 ¥ 100 ∗ 300 ∆ ∆ + ∗ 500 50 + 700 ∗ + + ∗ 40 ∗ ∆ ∗ ∆ + + 30 ∆ + ∗ +∆++ ∗ time to answer + + ∗ ∆ ∆ ∆ + ∆¥ 20 ∗¥ ∆∗∆¥ ∆ + ¥ + ¥ ¥ ¥ ∗ ∗ +¥ + ∆ ¥ ∗ ∆ 10 ¥

20 30 40 50 age

Figure 6.18: Plot of time to complete the experiment compared with the age of the partici- pant.

∗ ¥ ¥ 80 ¥ ∗ ∆∗ ¥ + ∆¥ ¥ 100 + ∗ ∆ ∗ 300 ∆ 500 ∆∆∆ + ∆ + ∗ 700 ∗¥¥∆ ∗ + ∗ ∆∗ ∆ + ∆ 60 ¥ + ∗ + + ∆ ∗ + + + Mark + ∆ ¥ ¥ ∗ + ∗ + 40 + + ∗ ∆ + 20 20 30 40 50 age

Figure 6.19: Plot of mark obtained in the experiment as a percentage compared with the age of the participant. One participant declined to state their age. 6.4 Results 125

80

∆ ∆ ∆ ∆ ∆¥ ∆¥ ¥ ¥ ¥ ∗¥ ∆∗¥ ∗¥ ∗ ∗ ∗ ∗ ∆∗ 60 + + + + + + +

40 Mark (%) + ¥ 100 ∗ 300 ∆ 500 + 700 20

20 30 40 50 Age below which trial is included

Figure 6.20: Plot of mark obtained in the experiment as a percentage as achieved by par- ticipants below the given age. The dotted line shows the number of participants included at that point.

Figure 6.19 shows the percentage of correct answers in the experiment for each partic- ipant plotted against their age. As in Figure 6.18 different symbols are used to indicate the size of the data set in that trial. Figure 6.20 shows the average mark obtained by partici- pants below a given age, for each data set size. From this graph it seems that participants below the age of 20 did equally well on the 100, 300 and 500 item sets, and badly on the 700 item set. In fact, there was only one participant younger than 20 years given the 700 item set. If we include participants aged younger than 30 years, we start to see that the average mark for the 700 item set is starting to fall, indicating that older participants had more difficulty with this set. By age 30 the dotted line shows us we have included about half of all participants. The only other marked age effect seems to be that the three partic- ipants older than 50 years had trouble with the 500 item set. Since no participants older than 50 years tried anything but the 500 item set we cannot tell if this effect was the same for other data set sizes.

6.4.6 Other Participant Differences

There may have been participants that generally did badly, both in the tutorial and in the experiment proper. In general we would expect participants to take longer and do worse in the main part of the experiment, since it is longer and harder than the tutorial. We would also expect, given the results shown above, that the difference would increase as the data set size increases, so the difference between tutorial and main experiment for 100 items 126 Evaluation

+ 20 + +∗ + ∆ ∗¥ + ¥ +¥ ¥ ¥ ∗ ∆ ∆∆ ∗ + 0 ∗ ∗+ ∆ ¥∗ ++ +∆ + ∆ +∆ ¥ ∆ ¥ ∗ + ∆ ∗ ∆ -20 + +∆ ∗ + + ∗ ∗ ∗ ¥¥ + -40 ∆ ∆ ¥ 100 ∗ 300 ∆ 500 + 700 -60 ∗ mark difference between tut and other mark difference 0 10 20 Time difference between tut and other

Figure 6.21: Scatterplot of the difference in times between the tutorial and the rest of the experiment for each participant. The difference in marks is plotted on the y axis, so a participant that did 10% better in the experiment than the tutorial will be drawn at y = 10. Similarly, the x axis shows the time difference in minutes, so a participant that took 10 minutes longer to complete the experiment than the tutorial will be drawn at x = 10.

would be less than for 700 items.

A comparison of the marks scored in the tutorial with the marks scored in the rest of the experiment, and similarly the time taken for the tutorial and the rest, is shown in Figure 6.21. The results are much as expected. The participant with the 300 data set at (1, −60) used search far more then necessary in the trial proper, but rarely selected the title found by the search, thus giving an incorrect answer in a slow way. The other outlying participant at (23, −7) was simply not concentrating on the tasks. Some of this participants log entries had minutes between them.

There were a number of participants who did relatively poorly in both the tutorial and the main tasks. These are shown in the top right hand corners of Figure 6.22 on the facing page and 6.23 on the next page. In the time comparison, one participant took 15 minutes for the tutorial and 30 for the main Tasks. This participant is an exceptional case, both in that I remember which log was created by this person, and that this person has a disorder of the brain (absent corpus callosum) which affects the ability to synthesise information. Similarly, the participant from the 500 item set who took 27 minutes to complete the main tasks is known to me, and has little computer experience. Another participant in this graph completed the tutorial in a normal amount of time, but took 29 minutes to complete the main tasks. This participant was described above as not concentrating on the tasks. The other serious outlier (23, 14) seemed to do quite well in the tasks, but at times took long 6.4 Results 127

15 + ¥ 100 ∗ 300 + ∆ 500 ∆ + 700 10

+ ∗

∗ + + + ∆+ + ∗ + 5 ∗ ¥+ ∆ + ∗¥ ∆∆ ∆ ¥ ∆∗¥∆ ∆ + ∗ +∗ ∗¥ ¥ +∗ + ∗ +∗+∆ ¥ ∆ ∆ ¥¥ ∗ + + ∆ ∗ ∆ ¥ Time to complete Tutorial (minutes) to complete Tutorial Time

0 10 20 30 Time to complete Main tasks (minutes)

Figure 6.22: Scatterplot of the time to complete the tutorial on the y axis and the time to complete the experiment tasks on the x.

100 ¥100 ∗ ∗300 ∆ + + ∆ ∗ ∗ ∆500 ∗∆ ¥ ∗ +700 + ∆ ∆ ∆+ ∆∗ 75 ∗ ∆ ∆¥¥+¥ ¥ ∆ ∗ ∗∆ ∗ ¥¥ + ∆ +¥ + ∗¥ ∗ + 50 ¥ + + + ∗ + ∆ ∆ ∗ + + + + 25

Av erage mark for Tutorial (%) erage mark for Tutorial Av +

20 40 60 80 Av erage mark for Main tasks (%)

Figure 6.23: Scatterplot of the difference in marks between the tutorial and the rest of the experiment for each participant. The marks are all from a limited set of values (5% increments for the main task marks for instance), and so are plotted with a small random displacement of up to 4% in both the x and y axes to avoid overprinting symbols. 128 Evaluation

Task 100 300 500 700 Time (sec) EASY • • • • 20 COUSIN • • • • 20 HARD • • • 20 RECSLIDER 45 CERT • • • 20 REC • • • 35 CERTEASY 45 CERTHARD • • 35 HARDREC • 45 EASYREC 35 CERTREC • • • 45 ALL 25

Table 6.1: Summary of results for the experiment. For each task, a bullet (•) indicates that the task was completed for a particular data set size within the time shown in the last column, with at least 70% accuracy.

breaks, at one time taking 17 minutes to answer a question.

In the mark comparison in Figure 6.23 on the preceding page we see one participant with an extremely poor (10%) average mark for the main tasks. This participant did attempt most of the questions, but possibly not with their whole heart, or perhaps just did not understand the display. Similarly, the participant at (28, 21) claimed in the feedback that they were not sure what they had to do and were confused. The participant doing the 500 item set at (22, 30) seemed to give up towards the end of their attempt. The participant with the 300 item set at (18, 60) stopped attempting questions early in the main tasks.

6.5 Summary

We now have results for all the tasks, and can draw some conclusions regarding the efficacy of VlUM 2.0 for each task. To do this we must first decide what an adequate level of performance is for each task. Let us say that a task was completed satisfactorily if the average mark was above 70% and the average time to complete was below a time set for each task. We shall not use the ‘Steps’ measure in this performance threshold, as it is accounted for in the ‘Time’ measure, and is more useful for analysing the strategies used to answer the task, rather than whether the task is answered adequately.

Table 6.1 shows the results of this comparison. On the left the tasks are listed. On the right are what I believe to be acceptable times in which to complete each of the tasks. A bullet (•) indicates the standard was met for that task. The ‘Time’ measure is shown in seconds, and includes the time to read and comprehend the question which we take to be five seconds for a simple task, and 15 seconds for a task involving two aspects. The benchmark for the time is set at a time I consider reasonable to complete such a task in general. For instance, the task HARDREC, to find a node in a graph such that it has a given 6.5 Summary 129 substring and recommendation, in a graph of greater than 100 items, will often take more than 40 seconds including reading and understanding the question.

Overall, it seems that participants did well in all data set sizes except 700 items. There were some exceptions, for instance RECSLIDER, CERTEASY, HARDREC and EASYREC.In RECSLIDER the need to ‘hunt’ for the minima or maxima proved not to be an efficient means of search, and participants tended to get an incorrect answer. CERTEASY was mis- understood by a number of participants, again leading to a low number of correct answers. HARDREC was answered better if you allow a leeway of 1% in the answers, although still not well enough to make the 70% accuracy threshold. EASYREC showed that sometimes it becomes difficult for participants to distinguish between a peer and a cousin. The problems with these tasks dragged the average mark for ALL tasks to below 70%. By 700 items, only the navigation tasks were being completed within the given time and accuracy constraints. However, it should be noted that in the CERT task the result for 700 items was not very much worse then for the others, although some rethinking of the display might be needed to increase performance for REC and related tasks at 700 items.

The experiment was designed with the assumption that VlUM 2.0 would fail to be use- able at 700 items. However, this was not the case. Indeed, there is not a steady degredation of performance over the data sizes that would allow us to predict where VlUM 2.0 finally fails. It would seem that, at least for basic tasks, VlUM 2.0 could scale well beyond the data set sizes used in this experiment. It may even be the case that, with careful tuning of some constants in the layout algorithm, at least three levels of the spanning tree are visible for any data set size, in which case perhaps the computational realities of displaying such sets might become the limiting factor.

Chapter 7

Conclusions

VlUM was developed to provide an interface which allows users access to their large user models over the Web. As stated in Chapter 1, there is a pressing need for systems that help a user access the personal data that an on-line system may have collected about them. Several reasons for scrutable user models are given in Chapter 2, including:

• Access to and control over personal information. This has become a legal issue in the European Union, and at the time of writing is a topic of debate in the U.S.

• Programmer accountability.

• Correctness and validation of the model.

• Aid to reflective learning.

At the same time, we see the emergence of user modelling shells, which support the reuse of user models across applications, and across user sessions. At least one user mod- elling shell has also provided tools for allowing a user to inspect their model [Kay99]. However, the UM tools are targeted towards the inspection of details of a small model, rather than the navigation and exploration of large models. As stated in Section 2.1.2 on page 14, a visualisation of a user model should show:

• Overview;

• Relevance of model components;

• Navigation paths;

• Component values;

• Component value confidence; so that a user may ask the questions given on page 2 about the beliefs held by the user model.

131 132 Conclusions

Furthermore, VlUM is designed to be used within Web sites, and run on generic hard- ware and operating systems. It therefore had other design constraints, including:

• A generalised user model format;

• Adjustable ‘category boundary’;

• Usable with more than 100 items;

• Usable on an average web browser;

• Usable over slow networks;

• Usable in a small screen space;

VlUM was designed iteratively, with the evaluations of two earlier implementations influencing the design and implementation of VlUM 2.0. VlUM 2.0 was evaluated as de- scribed in Chapter 6. Some 58 participants completed tasks that should be available at the interface of an overview of a scrutable user model. They were able to find answers to the general questions listed on page 2.

7.1 Future Directions

The results of the final evaluation provide some basis for improvements. Similarly, the user model format, while flexible, could be still more general and able to represent more complex user models.

7.1.1 User Model Representation

The user model as implemented here is a single file containing components. The compo- nents can have an associated ‘data set’, which can contain a name, a floating point com- ponent value, and a floating point component confidence. The component can also have references to other components. This was certainly adequate for the experiments in this thesis, but in future a number of changes could be made:

• Peers should be in an rdf:Bag collection. This would assert more formally that the peers are an unordered collection of similar resources.

• Peers could go in a separate file, to be integrated into the model on demand. This would allow the components to be related in different ways at the user’s request.

• The relations between the components could be dynamic. For example, in the movie domain a user may decide that they are much more interested in how movies are related by director than actor. They may be given a control to adjust how the graphs are constructed so that they can set their preference for directors over actors. To display these changes in real-time, the graph would have to be constructed at the client, using extra metadata in the components themselves. 7.1 Future Directions 133

• UM components have a notion of ‘type’, which was not implemented here. The model could add such an attribute to the component. Showing this in the visualisation could add unwanted complexity, but it could perhaps be shown in the status bar of the display.

• To ensure that user models from different systems are able to inter-operate within an RDF data model, the ontologies used in the metadata of the model must mesh. Although some standard definitions exist for some metadata, any serious effort to popularise an RDF user model format would require the acceptance of a standard set of terms for user models.

7.1.2 Visualisation

The current visualisation serves its purpose well. However, there were some areas in both design and implementation that could be revisited.

Display & Interface

VlUM 2.0 has been tested with models of up to 700 components. This was well outside the range of the current literature. The study with the most components previously was the SAM [CK94] study, showing 100 components, most of which were not visible at any given time. These studies extend upwards from 100 components, and 700 was chosen as a suitable top end because it was thought that VlUM 2.0 would start to seriously fail on models of that size. As it happens, VlUM 2.0 can handle 700 items quite effectively, although the response time on older Java virtual machines becomes slow. It would be worth testing larger models on newer, faster machines in the future to see where the VlUM 2.0 display does completely fail.

It may be that VlUM 2.0 does not ever completely fail. The component placement algorithm can be tuned, and it may be possible to make sure that three levels of the spanning tree are visible no matter how large the data set becomes. It is then the ‘cluttered’ areas that become unusable as the data set size grows, and as graphics display hardware improves the visibility of individual components in these cluttered areas could be enhanced by using transparency. The use of transparency could be extended to ‘fade’ components that do not match a given condition, for example movies that are not in English.

Alternatively the model could be partitioned into subareas. In the medical domain this might be topic areas like anatomy or physiology. The movie domain could use genres. Such partitioning could reduce the efficacy of the model if there is much interrelation of components between the partitions, but the approach is worth pursuing in future. Another possibility is to use a double-slider, as seen in Figure 2.8 on page 22. A user could ask the tool to show components with values within the specified range, which would reduce the clutter of the display. In general, there are a number of potentially useful search operations that might be employed in future. For instance, a single slider could set a ‘value of interest’, 134 Conclusions

and components with a value near this would be shown with full opacity, while components with values further away would fade into transparency.

Implementation

The layout algorithm as it stands does not adapt easily to different numbers of components. Indeed, in the tests in Chapter 6 I tried to hand-tune the value governing the amount of space given to components. A better algorithm may be found if hyperbolic space is used, or another of the distortion oriented techniques discussed in Section 2.2.3 on page 25.

In the years since the first implementation of VlUM, some generally useful toolkits for handling RDF have become available. These allow parsing, storage, and some querying. It would be worth evaluating these for use in any future implementations of VlUM.

Network Usage

Models can be large. The model for the online assessment domain, which contains 3 data sets and 498 components, is 545 kb in size. The model file for 700 movies with 1 data set is 444 kb. Also, while the VlUM 2.0 application itself is quite small (31 kb), it uses a number of support libraries to ensure compatibility with older web browser Java implementations. These support libraries total more than 3Mb. Thus running VlUM 2.0 involves the down- load of almost 4Mb of files, which could take up to 10 minutes on a 56K modem. This is clearly too long. Thankfully, newer web browsers use the ‘Java plugin’ instead of their own Java implementation. This plugin provides a more complete and up to date implementation of the Java runtime, that includes most of the support classes currently downloaded. At the time of writing it would be possible to reduce the download to less than 1 Mb. This is still somewhat large for uses with a 56K modem, but quite usable at higher network speeds.

Future work on VlUM 2.0 might deal with changing user models. This would require some redesign, as the current design works to load the model once up front to reduce the number of (slow) network requests during the VlUM 2.0 session itself. It would be possible with some internal changes to keep a network connection open listening for changes to the model.

Evaluation

The evaluation in Chapter 6 on page 99 tests enough of VlUM to say that users can use it find answers to common questions asked of scrutable user models. There are features of VlUM that have not been evaluated, including the improved ‘stretching’ facility, and comparisons of datasets. The final evaluation used data from the movie recommendation domain described in Section 3.2 on page 54. This domain had an important advantage in that it was broadly accessible, which allowed us to test on a user population from a broad range of backgrounds, knowledge, computer expertise and education levels. Therefore the results of the evaluation had broader relevance. 7.1 Future Directions 135

A good avenue of future evaluation would be to test VlUM 2.0 in an epistemological domain, such as the medical domain described in Section 3.1.3 on page 51. A user test in this particular domain would need to be done with medical students, and having such a con- trolled and known user population would allow further comparisons of task performance against academic records, age, sex, and other demographic data. However, this would mean that the results may not be as broadly relevant to other levels of educational achievement. It may therefore be desirable to test VlUM 2.0 on student populations in a range of teaching domains.

7.1.3 Further Uses

The work in this thesis has shown VlUM 2.0 working on a strictly limited set of problems in only two domains. It can be argued that VlUM 2.0 can be used in other problem domains, and be coherently extended with interesting functionality.

Other Application Domains

VlUM 2.0 is quite capable of displaying any graph. At the moment, it is coded to understand a particular graph structure in which displayed nodes are connected by a ‘peer’ property to other displayed nodes, and nodes linked by ‘results’ attributes containing information about the mark and reliability for the displayed node. Data from other domains could be displayed in VlUM 2.0 either by providing an RDF file in this format, or by changing VlUM 2.0 to adapt to different structures, possibly by using metadata in the RDF file itself.

For instance, a project management application could provide an RDF file containing the tasks involved in a large project and their dependencies. Each task could have a ‘score’ that indicates progress on that task, and a ‘reliability’ of that score. Project managers could then find and focus on problematic tasks, and see how it might affect their dependant tasks.

Of course, some domains do not have a natural graph structure, or have a graph structure that is so interconnected or inhomogeneous that VlUM 2.0 would not be able to make use of it for navigation. The problems with heavily interconnected or inhomogenously connected graphs in VlUM 2.0 were exposed in the experiment described in Section 4.3.3 on page 69.

Temporal Comparisons

One aspect of the VlUM 2.0 model not tested in Chapter 6 on page 99 was the comparison of data sets. For instance, the model for a medical student includes the current results, and also the results from the previous month. It is possible to see a comparison of these results, and thus find areas of improvement or regression, which are shown as the ‘score’. Components in which the student is getting more questions right are green, and components which the student is forgetting and getting more incorrect answers are shown as red. This facility is limited in the current implementation by two main factors. 136 Conclusions

• The very simple model for the medical domain does not model regression very well, being only an average score to date.

• There is no real abstraction for ‘time’. In the case of the medical data, we only have two data sets, and a general mechanism for comparing data sets. It would be better to have a more specific mechanism and interface for studying the change of the data over different time periods. Of course, the inclusion of such time series or model update data in the RDF file itself increases the file size dramatically.

It might also be possible to use the x axis to display temporal change. For instance, if the VlUM 2.0 display were wider, a line, or series of points could be drawn to the right of the display adjacent to each title. The colour of the points could show the progression of the score over time. If it was thought important to see the change of certainty over time as well, it would be possible to simply line up multiple instances of VlUM 2.0, each representing the model at a particular time, and link their selection mechanism. Thus a student could see their model from a number of dates in full, and selecting a component could cause all the displayed models to update in unison.

Adding to the User Model

As a user interacts with VlUM 2.0 , they are providing data that could be fed back into their model. For instance, repeated visits to a particular node probably indicate a particular ‘interest’ in that node. In the movie domain, this might lead to the node becoming more cer- tainly recommended. Alternatively, such data might be used to supplement a ‘bookmark’ facility in VlUM 2.0.

Privacy Considerations

In all user tests in this thesis, a single model was generated and used for all subjects. In the movie domain, this was necessary as there was no individual data. In the online assessment domain there was individual data, but it was preferable to run the user tests on a single users’ model to reduce variables in the experiment. In a true implementation it would be necessary to generate an individual model for each user. However, for privacy reasons, it is important to ensure that only that user can access their model.

When the online assessment system was being designed it was felt that for the system to be effective as a formative assessment tool, no student should have reason to fear being assessed on the data collected by the system. It was therefore decided that the implementa- tion would ensure that only the individual student would ever be able to access their data. The only public data, and, importantly, the only data seen by the staff, would be aggregate cohort data. The system implements an authentication scheme to ensure this policy.

It might be interesting in future to relax this privacy policy at the whim of the user. For instance, by using VlUM 2.0 a student could identify areas of their model they are comfort- able for others to see. In the assessment context this could lead to some interesting social 7.2 Contributions 137 developments, where showing or hiding sections of your model is seen to indicate how you are performing. However, in the broader context VlUM 2.0 could be an important tool in deciding what information in your model to release to the outside world. For instance, us- ing a more general user model, VlUM 2.0 might be able to show someone aspects of their model that are useful to release as ‘certainty’ and aspects of the model that affect privacy in colour. The user could then make informed decisions about what information to release.

A further role of VlUM 2.0 might be in showing what inferences can be drawn from a low level user model, and how that might effect privacy. For instance, a web advertising company may simply collect the browsing history of a user, which the user may not see as a breach of privacy. But from that they may be able to infer information from that model that the user would prefer to remain private. It may be possible to use VlUM 2.0, along with additional tools, to give an overview of the inferences such a company might be able to make from the large raw model.

7.2 Contributions

There are five major contributions of this thesis.

Creation of a Visualisation for Large User Models A visualisation of a large, generic user model was created, refined and implemented. The visualisation

• gave an overview of the model, allowing a user to find important points in the model. • showed the component names, values, and certainty, as well as relationships between components. • allowed users to set the ‘category boundary’, which they could interpret as an important ‘turning point’ in the belief values. VlUM then adjusted the display to match their choice. • allowed users to compare their model with other models. For instance, they could compare themselves to their cohort, or an expected model.

The visualisation was tested on models up to 700 items in size. It succeeded in allowing navigation and exploration of models containing at least 300 components.

Evaluation of the Effectiveness of the Visualisation The visualisation was tested on 58 participants, who were able to answer a number of useful questions from the model. As summarised in Section 6.5 on page 128, participants were able to find individual components, find relationships between components and find components of given certainty and belief value for models of 500 components. The navigation tasks were still being completed well for models of 700 components. To support this evaluation I designed and implemented a flexible, reusable environment for evaluating Web based tools. 138 Conclusions

Creation of a Flexible, Reusable User Model Representation The VlUM tool has been used to show user models containing both performance and recommendation data. The model format can represent all aspects of a user model required for this, and is easily extensible if other tools require more attributes.

Evaluation of Different Model Graphs The results of the second evaluation of VlUM, given in section 4.4.1 on page 71, show how sensitive the VlUM visualisation is to the connectedness of the graph. Homogeneous graphs are much easier to navigate in this display. This caused problems in early attempts to model the Online Assessment domain, as described in Section 3.1.3 on page 52, but was overcome to a large degree by combining multiple peering methods.

An Efficient Implementation of the Above, Usable on Web Sites The tools described in this thesis can, and have, been implemented within a web site, and operated over a ‘long thin pipe’, or network of limited bandwidth and noticeable latency. The VlUM tool has been used quite extensively on standard consumer-grade hardware and soft- ware.

In Chapter 1 I discussed the importance of allowing for scrutinising of user models. The goal of this thesis was the design of an interface for scrutinising large user models, particularly suited to use on the Web. Achieving this required a new user model represen- tation, a new visualisation design, and efficient implementation. To ascertain whether the goal was met, the tool was evaluated, and the results showed that users were indeed able to access their user model and find useful information from the overview. Appendix A

Learning Topic Example

What follows is the content of a learning topic. Formatting here is not as on the GMP web site (which is in any case not fixed).

LEARNING TOPIC -PRINCIPLES OF DIALYSIS

ROBYN JCATERSON: Renal Physician, Dept of Renal Medicine, Royal North Shore Hospital

KEYWORDS: membrane, solute size, dialyser, peritoneal dialysis, haemodialysis, haemodi- afiltration, renal failure, vascular access, biocompatibility, adequacy nutrition, complica- tions

Dialysis is a process whereby the composition of a solution is altered by exposure pores but larger solutes (such as proteins) cannot pass through the semipermeable membrane.

Dialysis of the blood can occur in two ways; haemodialysis using a dialyser as the filter with the semipermeable membrane, and peritoneal dialysis where the peritoneal membrane acts as the semipermeable membrane. In haemodialysis the blood runs through the dial- yser in one direction and the dialysate fluid in a counter current direction. The dialysate is made up to allow passage of certain solutes in each direction via a concentration gradient. Fluid can be removed by applying pressure across the membrane. This process is called ultrafiltration (UF). Haemodiafiltration is another process used where the filter has larger pores and the movement of solutes occurs by convection. This requires a special machine with on-line return of the large quantities of fluid removed by this procedure. In peritoneal dialysis the small capillaries in the peritoneal membrane allow diffusion of solutes when dialysate is added to the peritoneal cavity. Ultrafiltration of water occurs by osmotic ultra- filtration depending on the concentration of glucose in the dialysate.

Dialysis or haemofiltration can be used to treat acute and chronic renal failure. The decision to start dialysis is usually clinical, depending on the symptoms of the patient, biochemical values in the blood profile, and the availability of the different types of dialysis.

Haemodialysis for CRF is usually done 3 times a week for 4-6 hours each dialysis. Blood flows into the dialyser from the patient via a vascular access which can be formed in

139 140 Learning Topic Example

a variety of ways including temporary vas-cath, A-V fistula and grafts. Dialysers differ in size, pore size and membrane composition. The more expensive synthetic dialysers have a larger pore size and are more biocompatible but lead to greater ultrafiltration needing a machine with UF control.

Peritoneal dialysis is usually done by C.A.P.D. (Continuous Ambulatory Peritoneal Dialysis). This entails doing 4-5 exchanges of 2-3 litres of dialysate via a peritoneal catheter into the peritoneal cavity daily. The number of exchanges and the amount of fluid in each exchange depends on the size of the patient and the residual renal function. Some- times automated peritoneal dialysis is used overnight. This is a more expensive option and is used particularly in children in this country.

The decision as to which therapy is used in each patient depends on a number of vari- ables including; availability, patient preference, patient location, size, underlying medical condition and residual renal function.

The major complications associated with haemodialysis are vascular access difficulties, cardiac disease, bone disease and B2 microglobulin deposition.. The major complications associated with peritoneal dialysis are peritonitis and exit site infections, cardiac disease and difficulties with adequate dialysis once residual renal function disappears. Malnutrition can occur with both forms of dialysis and it is important that a good dietary protein intake is maintained as malnutrition is associated with high morbidity and mortality.

REFERENCES

1. Use the textbooks in your Tutorial Room

2. Optional References: Dialysis is obviously a specialised area and most general medical texts do not cover it well. Of the general texts I would suggest for a short review. Oxford Textbook of Medicine, (Third Edition), Editors DJ. Weatherall, JGG Leding- ham and DA Warrell, Oxford Medical Publications, Pages 3306 - 3313 A good easy to read short text on the basic principles of dialysis is. Handbook of Dialysis (Second Edition), John T Daurgirdas, Todd S Ing, Ed Little, Brown A much more detailed text on dialysis is Replacement of Renal Function by Dialysis, Editors Claude Jacobs, Carl Kjellotrand, Karl M Kock and James F Winchester, Kluver Academic Publishers This book is set out clearly in sections and chapters. I am not suggesting any partic- ular pages as the student can choose what sections might be of interest to them. The book should be in a number of medical libraries or renal units in different hospitals. Appendix B

Online Assessment User Surveys

B.1 First User Survey

The first user survey was a simple email to five students known to the author. The mail and responses from the three students to reply are reproduced below. Student names have been changed to ‘Student A’, ‘Student B’ etc. to protect the guilty.

B.1.1 Mail to Students

Dear friendly and at least vaguely IT aware GMP student (sorry - no personal addressing at this stage). My PhD research involves integrating a user modelling system into the GMP website and in particular the Online Assessment system. There are two prime reasons

5 this is a Good Thing.

1) If the online assessment system has some sensible idea of how you are going and how the rest of your year/group/whatever are going it should allow you to better gauge your progress.

10 2) If the system knows where you’re weak compared to others in your position it can try and be more intelligent about choosing questions to help you.

15 Right now i have to work out just how valid the data we have to use is, and what sort of comparisons may help, so i was wondering if you could answer the following questions. Results will be sent back to you when I’ve sorted them all out.

20 1) Do you ever use the Online Assessment system with others? (if so

141 142 Online Assessment User Surveys

then your answer is a group effort and i have to take that into account)

2) Would you be interested to know how you were going compared to

25 a) your year

b) your group

30 c) all students

d) what the staff think is a good standard at that stage in the course

e) all students as of that stage in the course

35 f) anything else?

3) What other questions would you suggest i ask if i were to extend this survey to all students?

40 4) anything else?

thanks James Uther - GMP web programmer

B.1.2 Student Responses

Hi James,

> 1) Do you ever use the Online Assessment system with others? (if so > then your answer is a group effort and i have to take that into

5 > account)

Yes, occasionally. But it will always be under one members login as we don’t have a group login.

10 > 2) Would you be interested to know how you were going compared to > a) your year

Yes

15 > b) your group

No - too easy for competitiveness to arise. B.1 First User Survey 143

> c) all students

20 Yes

> d) what the staff think is a good standard at that stage in the course

25 Definitely - I think this most important as it doesn’t encourage competition with others.

> e) all students as of that stage in the course

30 Yes

> f) anything else?

No

35 > 3) What other questions would you suggest i ask if i were to extend > this survey to all students?

Nothing comes to mind.

40 > 4) anything else?

Nothing comes to mind.Regards, Student A.

————-

James,

That sounds like a great idea for a research topic.

5 Many have been the times when I have lamented that the voluntary assessment system did not provide comprehensive and intuitive feedback concerning my various flaws and weaknesses.

Whilst it seems that the introduction of a statistics page is great in

10 principle it is still extremely primitive in its ability to give practical and informative critical review of one’s learning.

Could you be a little more specific as to how I can help give you an idea of ’how valid the data is’? 144 Online Assessment User Surveys

15 As to some of those questions - no I don’t use it as a group.

I would like to know how I am going v. the year, the gp, the profs’ etc.

20 What appeals to me is the idea you mentioned of the system being ’intelligent’ enough to tailor questions to a student’s areas of weakness.

Love your work,

25 Student B.

————-

HI James,

happy to help

5 > 1) Do you ever use the Online Assessment system with others? (if so > then your answer is a group effort and i have to take that into > account)

Yes. when i can actually be bothered to do it:-)

10 > 2) Would you be interested to know how you were going compared to > a) your year > b) your group > c) all students

15 > d) what the staff think is a good standard at that stage in the > course > e) all students as of that stage in the course

yes to all.

20 > 3) What other questions would you suggest i ask if i were to extend > this survey to all students?

would we prefer case by case scenarios.I know I would B.2 Second User Survey 145

50 . . 40

30 .

20 . .

Number of responses 10 .

0

0 1 2 3 4 5 Answer

Figure B.1: I use the Online Assessment system on my own (0 = often,5=never).

B.2 Second User Survey

The second user survey was done as a web page. There 177 respondents. Below are graphs of the responses to the relevant questions. 146 Online Assessment User Surveys

. 80

60

. 40 .

20 Number of responses

0 . .

0 1 2 3 4 Answer

Figure B.2: I use the Online Assessment system with others (0 = all of the time, 4 = never).

. . 60

40

20

Number of responses . . 0 .

0 1 2 3 4 Answer

Figure B.3: The format of questions suits my self assessment and learning style (0 = always, 4 = never). B.2 Second User Survey 147

. 80

60 . 40

20 . Number of responses . 0 ..

0 1 2 3 4 5 Answer

Figure B.4: On average I spend the following amount of time on any one ‘session’ (0 ¿ 30 minutes,5Á10minutes).

100 .

75

50 .

25 Number of responses

0 . . .

0 1 2 3 4 Answer

Figure B.5: The questions are relevant to the probmes (0 = never, 4 = often). 148 Online Assessment User Surveys

.

60 . 40 .

20 Number of responses

0 . .

0 1 2 3 4 Answer

Figure B.6: The explanations of why answers are right or wrong are useful for my learning (0 = never, 4 = often).

. 60

40 . . 20 . Number of responses . 0

0 1 2 3 4 Answer

Figure B.7: If I have a comment or query about a question, I use the feedback button (0 = often,4=never). Appendix C

Movie Domain Question File

5 <h2> Welcome! </h2> Welcome to the VLUM evaluation! We are evaluating a new way of

10 displaying a large number of movie titles and recommendations about the movies all at the same time to help people find movies they would like. To do this we need to ask you some questions and you have to try to answer them using the display on the left. Once you have answered each question,

15 you can move to the next page. <p> Each question page has a box down the bottom for feedback. You can type whatever you like into that box. <p>

20 Use the -->> link below to move to the next page.

What is your age? Answer in the box below, and then use

25 the -->> link below to move to the next page.

As you use this program, we’ll be logging what you do so we

149 150 Movie Domain Question File

30 can work out how well people can use it. But we won’t be telling you how well you are going! So we’ll ask you to do things, but we won’t comment on whether you did the right thing. And remember, we’re testing the software, not you! Just do your best.

35

We’d like to first introduce you to the interface by taking you through a few simple examples using the movie data.

40

<script> parent.browser.document.squidgeApplet.resetSquidge();

45 </script> In the frame on the left is the movie list, with movie titles ordered from oldest movies near the top to most recent at the Bottom. As you can see, <em>

50 Directed by William Wyler </em> has been selected.

55 As you move your mouse around the list of movies, the movie title your mouse is over changes colour to white, and its title and recommendation (if any) appear in the status bar at the bottom of the window. Move the mouse around a bit and see

60 for yourself. If the titles are close together you may have to move the mouse around a bit to select the movie you want.

65 pymark="movie==’Sabrina’"> <script> parent.browser.document.squidgeApplet.resetSquidge(); </script> Click on the movie

70 <em> Sabrina </em> (near the middle of the list). 151

It becomes the selected movie, and its relations are exposed. Movies are related by their similarity in cast, writer and

75 director. You can see that the movies <em> Dead End, The Remarkable Andrew, Witness for the Prosecution, The Love Lottery, Love in the Afternoon, </em> and

80 <em> Directed by William Wyler </em> are related to <em>

85 Sabrina </em> because they are bigger, brighter, and more spaced out than other titles. Slightly smaller and duller titles like <em>

90 </em> near the top of the display are less similar, and small dull obscured movies are unrelated to the selected movie.

95 The movie <em>Sabrina</em> is green. This means, in this data set, that we think that the person this data is from <strong> (which is not you, let’s call her

100 Gina)</strong> might like the movie. The more intense red or green (i.e. the more ‘fluro’), the better or worse we think Gina will like the movie. If you move the mouse over <em>Sabrina</em>, it’s title and a percentage indicating how much we recommend the movie

105 appears in the status bar at the bottom of the browser window.

Find and select a movie we think Gina would not like at all

110 (say, has a recommendation of < 40%).

reason="tutorial,rec,numericinput" type="input" cols="7"> How much do we recommend the movie you selected? (it is the percentage in the status bar). Type the answer into the box below.

120

The movie you have selected is indented somewhat from the left of the screen. The more indented the title, the less certain

125 we are about our decision as to whether Gina would like the movie. So if the title is flush left, we’re quite certain. If however we really have no information about whether Gina would like the movie or not the movie is placed one sixth of the width of the frame to the right and is coloured yellow.

130

Find and select a movie that we aren’t at all certain whether Gina would like or not. (The title will be yellow)

135

Find and select a movie the system is very uncertain about

140 Gina’s preference for liking it, but the information it has strongly points to her liking it.

145 reason="cert,rec,tutorial"> Now find and select a movie that we are certain Gina would like (i.e. the movie is shifted to the left, and is very green.)

150 We can search for movie titles. Under the ‘Action’ menu, there is a ‘Search Titles’ option which gives you a dialog box you can use to search for titles. You can search for any regular

155 expression or substring. For instance, "man" will match "many" or "manager" or "normandy". <p> Use the search dialog box to find all movies with "day" in the title. Select <em> Roman Holiday 153

</em> by clicking on it.

160

Find and select the movie closely related to

165 <em> Roman Holiday </em> that we have the most information on (i.e. we are most certain about our recommendation).

170

Sometimes the movie you want to select is in a cluttered area, obscured by other movie titles. You can sometimes get to the

175 movie you want if you select another movie first, which might free up the movie you are after. Try this now.

180 ‘Liking’ is relative. You can adjust what it means to ‘like’ a movie using the slider at the top of the program. If you slide it to the right, the criteria for ‘liking’ a movie will become tougher, and more of the movies will turn red. Sliding to the left will do the opposite. This is useful if you want to find

185 movies of a specific rating. For instance, to find the most highly rated movie, move the slider to the right until only one movie is still green.

190 Find and select the movie <em> Spartacus

195 </em> (you may have to use the search menu item), click on it, and then use the slide to make <em> Spartacus </em>

200 light red. Notice that the display shows more red movies, and that 154 Movie Domain Question File

movies that are more recommended than <em> Spartacus

205 </em> are still green.

210 choices="4"> <h2> And Now... </h2> The experiment, but before we get to that, could you please

215 tell us how many times you’ve done this experiment (gone through these questions) before, if at all? If you have done it before, please indicate how many times with the radio buttons below. If this is your first time, just go on to the next question.

220 see if repetition helps..

225 <h2> About to load more data </h2> I’m now going to load a new set of movies. The browser will

230 seem to hang while this is done. Click on the -->>next button below, and then please be patient. <p> Click next--> to begin.

235 <h2> The Experiment </h2>

240 Now that you have a basic understanding of how to use the applet, we’d like to get you to answer some questions. Do what each question asks, and when you’re satisfied that you’ve done it go to the next question. <p> <script> 155

245 parent.browser.document.squidgeApplet.nextDataSet(); </script>

250 Find the movie called <em>

255 Children’s Hour, The </em> and select it.

260 Find the movie called <em> Carrie

265 </em> and select it.

270 Find and select a movie that we think Gina would not like (recommendation < 40%) that is similar to <em>Carrie</em>.

275 Select any movie that we are uncertain whether Gina would like or not.

280 Select any movie about which we are very uncertain (certainty < 30%) but think Gina would not like (recommendation < 50%).

285

156 Movie Domain Question File

Find the movie called <em>Papillon</em> and Select it. You may want to use the search menu.

290

Find and select <em>Roman Holiday</em>.

295

Find and select the movie with the most certain recommendation

300 which contains the letters ‘it’.

Find and select <em>Superman II</em>.

305

Find and select <em>Spellbound</em>.

310 Find and select the movie that we have the most information about (i.e. we are most certain about either way) out of the

315 ones that are most similar to the currently selected movie.

Find and select a movie that has a recommendation of

320 greater than 80%

Find and select the movie Gina will like most. The slider may help here.

330 157

Find and select a movie that has a recommendation of less than 30%

335 Find and select the movie we think Gina will like least.

340 Find and select the movie we are most certain about, no matter whether we think it’s good or bad.

345

350 Find and select the most recommended movie with the letters ‘man’ in the title.

355 type="textarea" cols="35" rows="7"> Do you have any comments you’d like to add?

Appendix D

Graphs from the Evaluation

In all graphs, a bullet (•) will indicate a correct answer, while delta ( ) will indicate an incorrect answer. In some graphs precise alignment with a value is not required. For instance, often there are only four possible x values, 100, 300, 500 and 700. In these cases data points may have been shifted random small amounts from their correct value to aid readability. The number of data points used in a plot is shown on the graph, often with a value for ‘ol’. This is the number of ‘outlying’ values that were not shown in the plot. Outliers were values that were out of range for the data as described in Section 6.4 on page 106, or, in some rare cases, that were out of range for the graph. In this latter case the value was still used for averages and so forth, but not actually plotted as doing so generally reduced readability. Averages for each data set are plotted with a dotted line. One standard deviation is marked with a dash in graphs where such a measurement makes sense. There are three graphs shown for each task in the main part of the experiment.

Time taken to answer These are the plots of the time taken to answer each type of ques- tion for each set size. If the time taken was more than ten times the average, it was considered an outlying value, removed from the set. The average shown in the graph is the new average with these outlying values removed. Participants did sometimes get distracted and leave the task for some minutes, so some single outlying values were skewing results. Plots show all non-outlying answers that remain within the range shown in the graph. The average for correct answers is shown with a dotted line, and horizontal bars show 1 standard deviation from that average. Solid verti- cal lines are also used to indicate the first and third quartile of the data for correct answers. A legend for each data set size shows the number of correct, incorrect and outlying answers.

Percentage of answers that were correct These plots show the number of correct an- swers for each task as an average. The number of data points used is shown in a legend for each data set size.

Number of steps required to answer These plots show the number of steps taken to an- swer a question. A step is a title selection, a search, or the use of the slider. All

159 160 Graphs from the Evaluation

non-outlying answers are shown, and the average and 1 SD error bars are shown for correct answers. Values more than five times the initial average were considered outlying and removed. A legend for each data set size shows the number of correct and incorrect answers, and the number of outlying values removed in the plot.

D.1 Tutorial

In the tutorial, the graphs of percentage correct are not shown, as they only contain one value (percent correct for 300 items in the tutorial), which is already given in the appropri- ate tables in Section 6.4.4 on page 111. The other graphs in the tutorial section have only one data set size, and so are shown with their variable on the horizontal axis. The time to answer graphs show the mean as a vertical bar, and 1 standard deviation on each side of the mean as slightly smaller vertical bars. In other respects these graphs are the same as for the main part of the experiment easytutorial

¥ 46 ∆ 10 ol 1

∆ ¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥ ¥¥¥¥ ¥ ¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.1: Time to answer task ‘easy’ in tutorial. Times include the reading time for the task.

easytutorial

¥ 46 ∆ 10 ol 1

∆∆∆∆∆ ∆ ¥¥¥¥¥¥¥¥¥ ¥

0 2 4 6 8 10 Steps to answer

Figure D.2: Steps taken to correctly complete task ‘easy’ in tutorial. D.1 Tutorial 161

hardtutorial

¥ 48 ∆ 9 ol 0

∆ ¥¥¥¥¥ ¥¥ ¥¥¥¥¥¥¥¥¥ ¥¥¥¥¥¥¥¥¥¥¥¥¥ ¥ ¥ ¥ ¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.3: Time to answer task ‘hard’ in tutorial.

hardtutorial

¥ 48 ∆ 9 ol 0

∆∆ ∆∆∆ ∆ ¥¥¥¥¥¥¥¥¥ ¥¥¥ ¥¥ ¥¥ ¥¥

0 2 4 6 8 10 Steps to answer

Figure D.4: Steps taken to correctly complete task ‘hard’ in tutorial.

rectutorial

¥ 35 ∆ 19 ol 3

∆ ∆∆∆∆∆∆ ∆∆∆∆∆∆∆ ∆ ∆∆ ∆ ¥ ¥¥¥¥¥¥¥¥ ¥¥¥¥¥ ¥¥¥ ¥¥¥¥¥ ¥¥ ¥ ¥ ¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.5: Time to answer task ‘rec’ in tutorial. 162 Graphs from the Evaluation

rectutorial

¥ 37 ∆ 20 ol 0

∆ ∆∆∆∆ ∆∆ ∆ ∆∆ ∆ ¥¥¥ ¥¥¥¥¥¥ ¥¥¥¥ ¥¥ ¥ ¥¥¥¥ ¥ ¥

0 2 4 6 8 10 Steps to answer

Figure D.6: Steps taken to correctly complete task ‘rec’ in tutorial.

certtutorial

¥ 56 ∆ 1 ol 0

∆ ¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥ ¥¥¥¥¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.7: Time to answer task ‘cert’ in tutorial.

certtutorial

¥ 56 ∆ 1 ol 0

∆ ¥¥ ¥¥¥¥¥¥¥ ¥¥¥ ¥ ¥ ¥

0 2 4 6 8 10 Steps to answer

Figure D.8: Steps taken to correctly complete task ‘cert’ in tutorial. D.1 Tutorial 163

certeasytutorial

¥ 19 ∆ 38 ol 0

∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆ ∆∆ ∆∆ ∆ ∆ ∆ ¥¥¥ ¥¥¥ ¥¥ ¥¥ ¥ ¥¥ ¥ ¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.9: Time to answer task ‘certeasy’ in tutorial.

certeasytutorial

¥ 19 ∆ 36 ol 2

∆∆∆∆∆∆∆∆∆∆∆∆ ∆∆∆ ∆ ¥¥¥¥¥¥ ¥

0 2 4 6 8 10 Steps to answer

Figure D.10: Steps taken to correctly complete task ‘certeasy’ in tutorial.

certrectutorial

¥ 65 ∆ 48 ol 1

∆∆∆ ∆∆∆∆∆∆∆∆∆∆∆∆∆∆ ∆ ∆∆∆∆ ∆∆∆ ∆∆ ∆ ∆ ∆ ¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥ ¥¥¥¥ ¥ ¥¥ ¥ ¥ ¥

0 50 100 150 200 Time to answer (sec)

Figure D.11: Time to answer task ‘certrec’ in tutorial. 164 Graphs from the Evaluation

certrectutorial

¥ 64 ∆ 47 ol 3

∆∆∆∆∆ ∆∆∆∆∆∆ ∆∆ ∆∆ ∆∆ ¥¥ ¥¥¥¥¥¥¥¥¥ ¥¥ ¥¥ ¥

0 2 4 6 8 10 Steps to answer

Figure D.12: Steps taken to correctly complete task ‘certrec’ in tutorial. D.2 Experiment 165

D.2 Experiment

200 easy ¥ 20 ¥ 24 ¥ 24 ¥ 33 ∆0 ∆4 ∆3 ∆5 150 ol 0 ol 0 ol 1 ol 0

100

50

Time to answer (sec) Time ¥¥ ¥¥ ¥∆ ¥ ¥¥ ¥¥ ¥¥∆ ¥¥ ¥¥∆ 0 ¥¥ ¥¥∆ ¥¥∆ ¥¥∆

100 300 500 700 Data set size

Figure D.13: Time to answer task ‘easy’.

easy 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 38 0

100 300 500 700 Data set size

Figure D.14: Percentage of correct answers for task ‘easy’. 166 Graphs from the Evaluation

easy 10 ¥ 20 ¥ 24 ¥ 25 ¥ 33 ∆0 ∆4 ∆3 ∆5 8 ol 0 ol 0 ol 0 ol 0

6

4 ∆ ¥ ¥ ¥ ∆ ¥¥ Steps to answer 2 ¥¥ ¥¥ ¥¥∆ ¥¥∆ ¥¥ ¥¥∆ ¥¥∆ ¥¥¥∆∆ 0

100 300 500 700 Data set size

Figure D.15: Steps taken to correctly complete task ‘easy’.

200 cousin ¥ 17 ¥ 22 ¥ 22 ¥ 33 ∆3 ∆6 ∆6 ∆5 150 ol 0 ol 0 ol 0 ol 0

100

50 ¥

Time to answer (sec) Time ¥ ¥ ¥ ¥∆ ¥ ¥¥ ¥¥ ¥¥ ¥¥ 0 ¥¥∆∆ ¥¥¥∆∆ ¥¥∆∆ ¥¥∆∆

100 300 500 700 Data set size

Figure D.16: Time to answer task ‘cousin’. D.2 Experiment 167

cousin 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 38 0

100 300 500 700 Data set size

Figure D.17: Percentage of correct answers for task ‘cousin’.

cousin 10 ¥ 17 ¥ 22 ¥ 22 ¥ 33 ∆3 ∆6 ∆6 ∆5 8 ol 0 ol 0 ol 0 ol 0

6 ¥ 4 ¥ ∆ ¥ Steps to answer 2 ¥¥ ¥¥∆ ¥¥ ¥¥¥ ¥∆ ¥¥∆∆ ¥∆∆ ¥¥∆ 0 ∆ ∆ ∆ ∆

100 300 500 700 Data set size

Figure D.18: Steps taken to correctly complete task ‘cousin’. 168 Graphs from the Evaluation

200 hard ¥ 19 ¥ 19 ¥ 20 ¥ 34 ∆1 ∆9 ∆8 ∆3 150 ol 0 ol 0 ol 0 ol 1

100

¥ 50 ¥ ¥

Time to answer (sec) Time ¥ ¥ ¥ ¥ ¥¥ ¥ ¥¥ ¥¥ ¥¥ ¥¥ ¥¥¥ ¥¥ ¥¥ 0 ¥¥∆ ¥∆∆ ¥∆∆ ¥¥∆

100 300 500 700 Data set size

Figure D.19: Time to answer task ‘hard’.

hard 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 38 0

100 300 500 700 Data set size

Figure D.20: Percentage of correct answers for task ‘hard’. D.2 Experiment 169

hard 10 ¥ 19 ¥ 19 ¥ 20 ¥ 35 ∆1 ∆9 ∆8 ∆3 8 ¥ ol 0 ol 0 ol 0 ol 0

6

4 ¥ ¥ ¥ ¥¥ ¥∆ ¥ ¥¥ Steps to answer 2 ¥¥¥ ¥¥∆ ¥¥∆ ¥¥¥ ¥¥ ∆ ∆∆ ¥¥∆ 0 ∆ ∆ ∆

100 300 500 700 Data set size

Figure D.21: Steps taken to correctly complete task ‘hard’.

200 recslider ∆ ¥ 11 ¥ 12 ∆ ¥ 10 ¥ 6 ∆9 ∆16 ∆17 ∆29 150 ol 0 ol 0 ol 1 ∆ ol 1 ∆ ¥¥ ¥ 100 ∆ ¥ ∆ ¥∆ ¥∆ ∆ ∆ ∆ ¥ ¥∆ ∆ 50 ∆ ∆ ∆∆ ¥ ¥∆ ¥∆ Time to answer (sec) Time ¥ ∆ ∆ ¥∆ ∆ ∆∆ ¥¥ ¥∆ ∆ ∆∆ ∆ ¥∆ ¥ ∆ ¥¥∆ ∆∆ ∆ ∆ 0 ¥∆ ¥¥∆∆ ¥¥∆∆ ¥∆∆

100 300 500 700 Data set size

Figure D.22: Time to answer task ‘recslider’. 170 Graphs from the Evaluation

recslider 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 36 0

100 300 500 700 Data set size

Figure D.23: Percentage of correct answers for task ‘recslider’.

recslider 10 ¥ ¥∆ ¥ 11 ¥ 11 ¥ ¥ 11 ¥ 6 ∆9 ∆16 ∆16 ∆29 8 ∆ ol 0 ol 1 ol 1 ¥∆ ol 1

6 ¥ ¥¥∆ ¥ 4 ¥ ¥ ∆∆ ¥ ¥ ¥∆ Steps to answer 2 ¥¥∆ ¥¥∆∆ ¥∆∆ ¥∆∆ ¥¥∆ ¥∆∆ ¥∆∆ ¥∆∆ 0 ¥ ¥∆∆ ¥∆ ¥¥∆∆

100 300 500 700 Data set size

Figure D.24: Steps taken to correctly complete task ‘recslider’. D.2 Experiment 171

200 cert ¥ 16 ¥ 24 ¥ 24 ¥ 25 ∆4 ∆3 ∆4 ∆12 150 ol 0 ol 1 ol 0 ol 0

¥ 100 ∆ ¥ 50 ¥ ¥ ¥ ∆ Time to answer (sec) Time ¥ ¥ ¥∆ ¥∆ ¥ ¥ ¥ ¥∆ ¥¥∆ ¥¥ ¥¥ ¥∆ ¥¥∆∆ ¥¥ ¥¥∆ ¥¥∆∆ 0 ¥ ¥¥∆ ¥¥∆ ¥¥∆

100 300 500 700 Data set size

Figure D.25: Time to answer task ‘cert’.

cert 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 37 0

100 300 500 700 Data set size

Figure D.26: Percentage of correct answers for task ‘cert’. 172 Graphs from the Evaluation

cert 10 ¥ 16 ¥ 25 ¥ 24 ¥ 24 ∆4 ∆3 ∆4 ∆12 8 ol 0 ol 0 ol 0 ol 1

6

4 ¥ ¥∆ ¥ ¥ Steps to answer 2 ¥¥∆ ¥¥ ¥∆ ¥¥¥∆ ¥¥ ¥¥∆ ¥¥¥¥∆∆ 0 ¥∆ ∆ ∆

100 300 500 700 Data set size

Figure D.27: Steps taken to correctly complete task ‘cert’.

200 rec ¥ ¥ ¥ 16 ¥ ¥ 24 ¥ 25 ¥ 19 ∆4 ∆4 ∆2 ∆15 150 ol 0 ol 0 ol 1 ol 2 ¥∆ 100 ¥∆ ¥¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥¥ 50 ¥ ¥¥ ¥ ∆ ¥ ¥∆ Time to answer (sec) Time ¥¥ ¥ ¥ ¥∆ ¥¥ ¥ ¥ ¥¥ ¥¥ ¥¥∆ ¥¥∆ ¥¥∆ ¥¥∆ ¥∆ ¥¥∆ ¥¥ ¥¥∆ ¥∆∆ 0 ¥ ¥∆∆ ¥ ¥∆

100 300 500 700 Data set size

Figure D.28: Time to answer task ‘rec’. D.2 Experiment 173

rec 100

80

60

40 Percent Correct 20

¥ 20 ¥ 28 ¥ 28 ¥ 36 0

100 300 500 700 Data set size

Figure D.29: Percentage of correct answers for task ‘rec’.

rec 10 ¥ ¥ ¥ ¥ 16 ¥ ¥ 24 ¥ 25 ¥ 16 ∆4 ∆4 ∆2 ∆17 8 ol 0 ol 0 ol 1 ol 3 ¥ ∆ 6 ¥ ¥ ¥ ¥ 4 ¥ ¥ ¥ ∆ ¥ ¥¥ ¥∆ Steps to answer 2 ¥¥∆ ¥¥ ¥¥ ¥¥∆ ¥¥∆ ¥¥∆ ¥¥∆ ¥¥∆∆ 0 ¥∆ ¥ ∆

100 300 500 700 Data set size

Figure D.30: Steps taken to correctly complete task ‘rec’. 174 Graphs from the Evaluation

200 certeasy ¥ 3 ¥ 1 ¥ 4 ¥ ¥ 3 ∆7 ∆12 ∆9 ∆14 150 ol 0 ol 1 ol 1 ol 1 ∆ ∆ 100 ¥ ∆∆

∆ ∆∆ 50 ∆ ¥¥∆ ∆ ¥ ∆ Time to answer (sec) Time ∆∆ ∆ ∆ ∆ ∆ ¥ ¥∆ ¥∆ ¥∆ ∆ ∆∆ ∆∆ ∆ ∆ ∆ 0 ∆ ∆

100 300 500 700 Data set size

Figure D.31: Time to answer task ‘certeasy’.

certeasy 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 18 0

100 300 500 700 Data set size

Figure D.32: Percentage of correct answers for task ‘certeasy’. D.2 Experiment 175

certeasy 10 ¥ 3 ¥ 1 ¥ 4 ¥ 3 ∆7 ∆13 ∆9 ∆15 8 ol 0 ol 0 ol 1 ol 0

6

4 ∆ ∆ ∆ ∆ Steps to answer 2 ∆ ∆ ∆ ¥∆ ¥∆ ¥∆∆ ¥¥∆∆ ¥∆∆ 0 ∆ ∆

100 300 500 700 Data set size

Figure D.33: Steps taken to correctly complete task ‘certeasy’.

200 certhard ¥ 9 ¥ 10 ¥ 11 ¥ 13 ∆1 ∆4 ∆1 ∆5 150 ol 0 ol 0 ol 2 ol 1 ¥ 100 ¥ ¥∆ ¥∆ ¥ ¥∆ 50 ¥ ¥ ¥¥∆ Time to answer (sec) Time ¥ ¥ ¥ ¥¥ ¥ ¥¥ ¥∆ ¥¥∆ ¥¥∆ ¥¥ ¥ ¥ ¥ ¥ 0 ∆∆ ∆

100 300 500 700 Data set size

Figure D.34: Time to answer task ‘certhard’. 176 Graphs from the Evaluation

certhard 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 19 0

100 300 500 700 Data set size

Figure D.35: Percentage of correct answers for task ‘certhard’.

certhard 10 ¥ 9 ¥ 10 ¥ 13 ¥ 14 ∆1 ∆4 ∆1 ∆5 8 ol 0 ol 0 ol 0 ol 0

6

4 ¥ ¥ ¥∆ ¥ ¥∆ Steps to answer 2 ¥¥∆ ¥¥∆ ¥¥ ¥¥∆ ¥ 0 ∆ ∆

100 300 500 700 Data set size

Figure D.36: Steps taken to correctly complete task ‘certhard’. D.2 Experiment 177

200 hardrec ¥ 9 ¥ 5 ¥ 5 ¥ 2 ∆1 ∆9 ∆9 ∆16 150 ol 0 ol 0 ol 0 ol 0

100 ¥ ∆ ¥∆ ∆ ∆ 50 ¥ ¥ ∆ ∆ ∆ ∆ Time to answer (sec) Time ¥∆ ¥¥ ¥∆ ¥ ∆ ¥ ¥∆ ∆ ¥∆ ¥ ¥∆∆ ∆ ∆∆ ¥¥∆ ¥ 0 ∆∆ ∆ ∆∆

100 300 500 700 Data set size

Figure D.37: Time to answer task ‘hardrec’.

hardrec 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 18 0

100 300 500 700 Data set size

Figure D.38: Percentage of correct answers for task ‘hardrec’. 178 Graphs from the Evaluation

hardrec 10 ¥ 9 ¥ 5 ¥ 5 ¥ 2 ∆1 ∆9 ∆9 ∆16 8 ol 0 ol 0 ol 0 ol 0

6

4 ∆ ¥¥∆ ¥∆ ¥∆∆ ∆ Steps to answer 2 ¥¥ ¥¥∆ ¥¥∆ ¥∆∆ ∆ ∆∆ 0 ∆ ∆

100 300 500 700 Data set size

Figure D.39: Steps taken to correctly complete task ‘hardrec’.

hardrec2 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 18 0

100 300 500 700 Data set size

Figure D.40: Percentage of correct answers for task ‘hardrec’, taking Roman Holiday as a correct answer in all data sets. D.2 Experiment 179

200 easyrec ∆ ¥ 5 ¥ 9 ¥ 8 ¥ 4 ∆5 ∆5 ∆5 ∆13 150 ol 0 ol 0 ∆ ol 1 ol 2 ¥ ∆ ¥ 100 ∆ ∆ ∆ ¥ ¥ ¥∆ 50 ∆ ∆ ∆ Time to answer (sec) Time ¥ ∆ ∆∆ ¥ ¥ ¥∆ ¥∆ ¥¥ ∆ ¥ ¥∆ ¥∆ ¥ ¥∆ 0 ¥ ¥¥∆ ¥∆ ¥∆

100 300 500 700 Data set size

Figure D.41: Time to answer task ‘easyrec’.

easyrec 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 19 0

100 300 500 700 Data set size

Figure D.42: Percentage of correct answers for task ‘easyrec’. 180 Graphs from the Evaluation

easyrec 10 ¥ 5 ¥ 9 ¥ 8 ¥ 5 ∆5 ∆5 ∆6 ∆13 8 ol 0 ol 0 ¥ ol 0 ol 1 ∆ 6 ∆ ∆ 4 ∆ ∆ Steps to answer 2 ¥∆ ¥∆ ¥∆ ∆ ¥¥∆ ¥¥∆ ¥¥ ¥¥∆∆ 0

100 300 500 700 Data set size

Figure D.43: Steps taken to correctly complete task ‘easyrec’.

200 certrec ¥ 8 ¥ 13 ¥ 13 ¥ 11 ∆2 ∆1 ∆1 ∆ ∆7 150 ol 0 ol 0 ol 0 ol 1 ¥ ¥ ¥ 100 ¥ ¥∆ ¥ ¥ 50 ¥ ¥ ¥ ¥ ¥¥ ¥ ¥∆∆ Time to answer (sec) Time ¥¥ ¥ ¥ ¥ ¥ ∆∆ ¥¥ ¥∆ ¥∆ ¥¥∆ ¥ ¥ ¥ 0 ∆ ¥ ¥ ¥¥∆

100 300 500 700 Data set size

Figure D.44: Time to answer task ‘certrec’. D.2 Experiment 181

certrec 100

80

60

40 Percent Correct 20

¥ 10 ¥ 14 ¥ 14 ¥ 19 0

100 300 500 700 Data set size

Figure D.45: Percentage of correct answers for task ‘certrec’.

certrec 10 ¥ 8 ¥ 13 ¥ 13 ∆ ¥ 11 ∆2 ∆1 ∆1 ∆7 8 ol 0 ol 0 ol 0 ol 1

6 ¥ ¥ 4 ¥ ∆ ∆ Steps to answer 2 ¥ ¥ ¥ ¥¥∆ ¥¥¥∆ ¥¥∆ ¥∆ 0 ¥ ¥ ¥∆

100 300 500 700 Data set size

Figure D.46: Steps taken to correctly complete task ‘certrec’. 182 Graphs from the Evaluation

Figure 6.16 on page 123 shows the average time to answer over all tasks. Figure 6.17 all 100

80

60

40 Percent Correct 20

¥ 200 ¥ 280 ¥ 280 ¥ 372 0

100 300 500 700 Data set size

Figure D.47: Average percent correct.

on page 123 shows the average number of steps to complete a task. Appendix E

RDF File of a Movie Recommendation Model

5 xmlns:dc="http://purl.org/metadata/dublin_core/" xmlns:gmp="http://www.gmp.usyd.edu.au/schema/resources/">

The following is a componet that represents the beliefs about the movie Crook Buster.

10 The ‘about’ attribute gives a URL to the movie page. The ‘dc:Title’ gives the movie name. Crook Buster The results node gives the dataset, score and

15 certainty. There is no belief about this movie. The score is 0.5 but has no certainty. Then come the links to peers.

20

25

Guy Named Joe, A

183 184 RDF File of a Movie Recommendation Model

30 There is an uncertain belief about this movie, that the user will like it (0.74).

35

40

Directed by William Wyler There is an uncertain belief about this movie,

45 that the user will love it (0.95). It seems that the few people who have watched this were impressed.

50

55

Star Wars: Episode I - The Phantom Menace

60 This movie had the most data for the belief value. On the whole, people thought it was pretty good.

65

70 Appendix F

RDF File of an Online Assessment Model

5 xmlns:dc="http://purl.org/metadata/dublin_core/" xmlns:gmp="http://www.gmp.usyd.edu.au/schema/resources/">

This is a component from the Online Assessment Model. The URL is (unfortunately) synthetic. The GMP web site

10 was not re-architected to allow free access to the learning topics by simple URL. The topic has a title. Normal Weight

15 And some peers.

20 It also has four associated beliefs, one for the specific

25 user, one for the user 1 month ago, one for the users’ cohort, and one of a ‘good’ student as defined by the faculty.

185 186 RDF File of an Online Assessment Model

30 gmp:mark="0.5" gmp:reliability="0.14"/>

35

Macro-nutrients

40

45 In this topic, the student has done some questions, but done very well in them. It seems that they were done more than one month ago since the values have not changed in that

50 time. The cohort is doing less well.

55

60 Appendix G

Movie Experiment Logs

These are three logs from the experiment as described in Section 5.3.2 on page 94. Three representative sessions were chosen. In each case I have annotated the logs by including comments, which are lines in a different style. Long lines have been wrapped for readabil- ity, and line continuation in these cases is shown by a four character indent.

G.1 Tasks Done Well

945391761146.log

#log format v 2.1 945487068860|||0|||start session 945487068920|||60|||doAnimate to true 945487071330|||2410|||model is http://people.gmp.usyd.edu.au/-

5 hemul/sq2p/gendata/movies300.rdf 945487078140|||6810|||model size 300 The following is a measure of the interconnectedness and clumpyness of the graph. shows the max, min and average number of peers for each node.

10 945487078140|||0|||maxPeers 6 min 6 average 6.0 This is a topic, or movie, selection (or, in this case, a scripted setting). The log entry shows the timestamp, the time since the last log entry, the title of the movie, its index in a question database, the mark, or recommendation,

15 the reliability, or certainty, and the peers of the selected title separated by a vertical bar. 945487079020|||880|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147|

187 188 Movie Experiment Logs

20 945487079130|||110|||starting animation 945487079630|||500|||animation done - held 450 The data set displayed is set to What you would like. 945487079630|||0|||changed to display 0 - What you would like The first question is presented.

25 945487081110|||1480|||presented Question 0 id ’t-welcome’ There was an attempt to resize the browser window. 945487081550|||440|||resize to x=350, y=645 The participant comments that they can’t resize the browser window. This is true. The window size was fixed to reduce the

30 number of variables in the experiment. 945487188380|||106830|||Question 0 feedback Can’t resize The first question requiring an answer. The participant was prompted for their age, which they gave. They say they 20 years old.

35 945487189040|||660|||presented Question 1 id ’age’ 945487195140|||6100|||Question 1 response 20 945487195140|||0|||Question 1 feedback 945487195580|||440|||presented Question 2 id ’’ 945487208260|||12680|||Question 2 feedback

40 945487208590|||330|||presented Question 3 id ’’ 945487216670|||8080|||Question 3 feedback The display is being reset. This is to put it in a known state. 945487217000|||330|||--- resetting --- 945487217000|||0|||changed to display 0 - What you would like

45 945487217880|||880|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147| 945487217880|||0|||--- reset --- Resetting complete.

50 945487217930|||50|||presented Question 4 id ’’ 945487343000|||125070|||Question 4 feedback 945487343380|||380|||presented Question 5 id ’’ 945487411760|||68380|||Question 5 feedback The display is reset again in case the participant has

55 played with it during the last few questions. 945487412090|||330|||--- resetting --- 945487412090|||0|||changed to display 0 - What you would like 945487412920|||830|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391,

60 peers 33|171|34|121|135|147| 945487412920|||0|||--- reset --- The first real question of the tutorial. The participant is G.1 Tasks Done Well 189

asked to click in the movie Sabrina, which, in this case, the do. The animation took 880 milliseconds, of which

65 80 were a deliberate slowing. 945487412970|||50|||presented Question 6 id ’sabrina’ 945487416650|||3680|||clicked on |Sabrina|, index 135, mark 0.77, reliability 0.713675, peers 34|289|151|73|134|150| 945487416650|||0|||starting animation

70 945487417530|||880|||animation done - held 80 945487515350|||97820|||Question 6 feedback A question with no response required. 945487515850|||500|||presented Question 7 id ’’ 945487542590|||26740|||Question 7 feedback

75 The question asked the participant to select a movie with a recommendation of < 40%. They selected two such movies. For the experiment we considered their response to be the last movie selected. The participant seems to be spending quite a bit of time searching for the answer (78 seconds and

80 17 seconds). 945487542870|||280|||presented Question 8 id ’notlike1’ 945487621080|||78210|||clicked on |Billy Two Hats|, index 235, mark 0.33, reliability 0.211914, peers 209|299|208|43|58|25| 945487621080|||0|||starting animation

85 945487622020|||940|||animation done - held 80 945487638880|||16860|||clicked on |Captain Sindbad|, index 187, mark 0.33, reliability 0.315005, peers 120|118|297|88|207|117| 945487638880|||0|||starting animation 945487639760|||880|||animation done - held 80

90 945487641570|||1810|||Question 8 feedback The question asked what the reliability of the previous selection was. The participant answered correctly (33.0%). 945487641900|||330|||presented Question 9 id ’trecommendRadio’ 945487654370|||12470|||Question 9 response 33.0

95 945487654370|||0|||Question 9 feedback 945487654750|||380|||presented Question 10 id ’’ 945487677770|||23020|||Question 10 feedback 945487678150|||380|||presented Question 11 id ’tunc’ 945487683640|||5490|||clicked on |Lone Star, The|, index 18,

100 mark 0.5, reliability 0.0, peers 9|20|198|243|197|75| 945487683640|||0|||starting animation 945487684520|||880|||animation done - held 40 945487687650|||3130|||Question 11 feedback This is a wrong answer, but close. We were looking for a

105 reliability of >= 70%. Their selection had a reliability 190 Movie Experiment Logs

of 66%. This is a good attempt, since the question also required a certainty of <=0.3 and >=0.1, so there was a lot to think about. 945487688040|||390|||presented Question 12 id ’tunc0’

110 945487728680|||40640|||clicked on |Another Time, Another Place|, index 157, mark 0.66, reliability 0.182596, peers 78|17|27|246|160|75| 945487728680|||0|||starting animation 945487729670|||990|||animation done - held 40

115 945487733520|||3850|||Question 12 feedback 945487733950|||430|||presented Question 13 id ’t--certain-like-0’ 945487742580|||8630|||clicked on |Lawrence of Arabia|, index 183, mark 0.86, reliability 0.885996, peers 139|215|211|124|199|284| 945487742580|||0|||starting animation

120 945487743510|||930|||animation done - held 80 945487758560|||15050|||Question 13 feedback 945487758950|||390|||presented Question 14 id ’t-harddaysearch---0’ 945487806350|||47400|||searched for day 945487806400|||50|||starting animation

125 945487807450|||1050|||animation done - held 120 945487823370|||15920|||clicked on |Roman Holiday|, index 130, mark 0.8, reliability 0.764539, peers 162|190|224|2|126|161| 945487823370|||0|||starting animation 945487824310|||940|||animation done - held 80

130 945487827330|||3020|||Question 14 feedback This question asks for the most certain movie that is a peer of Roman Holiday. This participant made the common mistake of simply selecting the most certain movie, whether or not it was a peer of Roman Holiday.

135 945487827880|||550|||presented Question 15 id ’t-easy-certain-like-0’ 945487848360|||20480|||clicked on |Star Wars: Episode I - The Phantom Menace|, index 296, mark 0.77, reliability 1.0, peers 298|54|293|32|245|136| 945487848360|||0|||starting animation

140 945487849300|||940|||animation done - held 80 945487873300|||24000|||Question 15 feedback In the following two questions the participant is simply playing. No response is actually required. 945487873740|||440|||presented Question 16 id ’’

145 945487892140|||18400|||clicked on |Young Doctors, The|, index 172, mark 0.69, reliability 0.211914, peers 138|265|269|279|277|278| 945487892140|||0|||starting animation 945487893020|||880|||animation done - held 40 G.1 Tasks Done Well 191

945487906750|||13730|||clicked on |Roots of Heaven, The|,

150 index 159, mark 0.49, reliability 0.294555, peers 156|205|179|252|253|259| 945487906750|||0|||starting animation 945487907740|||990|||animation done - held 120 945487911030|||3290|||Question 16 feedback

155 945487911310|||280|||presented Question 17 id ’’ Here the slider is being used. 945487919550|||8240|||slider to 49 945487920540|||990|||slider to 21 945487927460|||6920|||slider to 22

160 945487928500|||1040|||slider to 79 945487954100|||25600|||slider to 80 945487955080|||980|||slider to 98 945487956130|||1050|||slider to 99 945487959480|||3350|||slider to 98

165 945487960470|||990|||slider to 90 945487965850|||5380|||clicked on |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147| 945487965850|||0|||starting animation

170 945487966840|||990|||animation done - held 40 945487976510|||9670|||Question 17 feedback This question asked the user to find the movie Spartacus, and then introduces the use of the slider to find movies that are above or below a particular recommendation.

175 945487976890|||380|||presented Question 18 id ’t-spartacu-0’ 945487987600|||10710|||clicked on |Gunfighter, The|, index 112, mark 0.71, reliability 0.508569, peers 162|109|116|125|158|295| 945487987600|||0|||starting animation 945487988480|||880|||animation done - held 40

180 The participant searches for spart to find Spartacus. 945487998370|||9890|||searched for spart 945487998370|||0|||starting animation 945487999300|||930|||animation done - held 80 945488000780|||1480|||clicked on |Spartacus|, index 169,

185 mark 0.79, reliability 0.804729, peers 102|151|77|257|188|283| 945488000780|||0|||starting animation 945488001880|||1100|||animation done - held 80 And Spartacus is found. The slider is then explored as prompted by the question.

190 945488004080|||2200|||slider to 89 945488005070|||990|||slider to 9 192 Movie Experiment Logs

945488006110|||1040|||slider to 14 945488019070|||12960|||slider to 15 945488020120|||1050|||slider to 44

195 945488021820|||1700|||slider to 45 945488022860|||1040|||slider to 54 945488023960|||1100|||slider to 59 945488025010|||1050|||slider to 61 945488025990|||980|||slider to 66

200 945488027040|||1050|||slider to 71 945488028030|||990|||slider to 74 945488029070|||1040|||slider to 78 945488030110|||1040|||slider to 86 945488031100|||990|||slider to 92

205 945488032150|||1050|||slider to 89 945488033190|||1040|||slider to 83 945488036650|||3460|||slider to 82 945488037690|||1040|||slider to 75 945488040490|||2800|||slider to 74

210 945488041480|||990|||slider to 66 945488042530|||1050|||slider to 74 945488043520|||990|||slider to 75 945488072790|||29270|||Question 18 feedback 945488073230|||440|||presented Question 19 id ’doneBefore’

215 945488099480|||26250|||Question 19 response 0 945488099480|||0|||Question 19 feedback 945488099920|||440|||presented Question 20 id ’’ 945488109320|||9400|||Question 20 feedback VlUM is now told to load a new data set. The set is chosen

220 at random from 100, 300, 500 or 700 item sets. In this case the 500 item set is chosen. 945488109810|||490|||--- moving to new data set --- 945488109810|||0|||model is http://people.gmp.usyd.edu.au/hemul/- sq2p/gendata/movies500.rdf

225 945488115080|||5270|||model size 500 In the 500 item set there were some nodes that had less than 6 peers, so the average drops slightly. 945488115080|||0|||maxPeers 6 min 4 average 5.996 945488115960|||880|||set topic to |Directed by William Wyler|,

230 index 476, mark 0.95, reliability 0.261391, peers 47|250|54|167|196|209| 945488115960|||0|||starting animation 945488117390|||1430|||animation done - held 0 945488117390|||0|||changed to display 0 - What you would like G.1 Tasks Done Well 193

235 945488118540|||1150|||presented Question 21 id ’’ 945488148150|||29610|||Question 21 feedback The first question of the main experiment. The question asks the participant to select a particular title that is a peer of the current selection. This participant uses the search function

240 to to this, which is interesting because the answer should be easily visible. 945488148480|||330|||presented Question 22 id ’chourSelect’ 945488158090|||9610|||searched for child 945488158090|||0|||starting animation

245 945488159460|||1370|||animation done - held 0 945488160890|||1430|||clicked on |Children’s Hour, The|, index 250, mark 0.81, reliability 0.499133, peers 180|47|306|61|476|106| 945488160890|||0|||starting animation

250 945488162100|||1210|||animation done - held 0 945488173910|||11810|||Question 22 feedback 945488174350|||440|||presented Question 23 id ’carrieSelect’ 945488177260|||2910|||clicked on |Carrie|, index 180, mark 0.7, reliability 0.332029, peers 187|47|250|373|384|376|

255 945488177260|||0|||starting animation 945488178470|||1210|||animation done - held 0 945488180500|||2030|||Question 23 feedback 945488180830|||330|||presented Question 24 id ’whiffsSelect’ 945488192530|||11700|||clicked on |Whiffs|, index 384, mark 0.34,

260 reliability 0.211914, peers 276|180|202|373|376|379| 945488192530|||0|||starting animation 945488193740|||1210|||animation done - held 0 945488201650|||7910|||Question 24 feedback 945488202250|||600|||presented Question 25 id ’lowReliab’

265 945488211480|||9230|||clicked on |Ridin’ for Love|, index 8, mark 0.5, reliability 0.0, peers 187|0|7|4|1|6| 945488211480|||0|||starting animation 945488212850|||1370|||animation done - held 0 945488218950|||6100|||Question 25 feedback

270 In this case, the participant found an almost correct answer, but then selected a correct answer a few seconds later. 945488219390|||440|||presented Question 26 id ’highReliab’ 945488250140|||30750|||clicked on |Hemingway’s Adventures of a Young Man|, index 258, mark 0.5, reliability 0.300064,

275 peers 218|293|134|435|478|450| 945488250140|||0|||starting animation 945488251350|||1210|||animation done - held 0 194 Movie Experiment Logs

945488260030|||8680|||clicked on |Oh Dad, Poor Dad, Mama’s Hung You in the Closet and I’m Feeling So Sad|, index 313,

280 mark 0.47, reliability 0.294555, peers 289|85|170|287|72|415| 945488260030|||0|||starting animation 945488261240|||1210|||animation done - held 0 945488272610|||11370|||Question 26 feedback 945488272990|||380|||presented Question 27 id ’papillonSelect’

285 945488281120|||8130|||searched for pap 945488281120|||0|||starting animation 945488282330|||1210|||animation done - held 0 945488284030|||1700|||clicked on |Papillon|, index 364, mark 0.76, reliability 0.773986, peers 345|247|325|434|57|56|

290 945488284030|||0|||starting animation 945488285410|||1380|||animation done - held 0 945488289420|||4010|||Question 27 feedback Again, the answer to this question should have been quite visible, so the use of the search function is a slow approach.

295 945488289750|||330|||presented Question 28 id ’rhSelect’ 945488298480|||8730|||searched for roman 945488298530|||50|||starting animation 945488299740|||1210|||animation done - held 0 945488300680|||940|||clicked on |Roman Holiday|, index 187,

300 mark 0.8, reliability 0.764539, peers 236|276|345|8|180|233| 945488300680|||0|||starting animation 945488301880|||1200|||animation done - held 0 945488302760|||880|||Question 28 feedback 945488303150|||390|||presented Question 29 id ’anotherCert’

305 945488311110|||7960|||searched for it 945488311110|||0|||starting animation 945488312490|||1380|||animation done - held 0 945488336100|||23610|||clicked on |Witness for the Prosecution|, index 216, mark 0.83, reliability 0.670908,

310 peers 218|132|196|222|245|149| 945488336100|||0|||starting animation 945488337370|||1270|||animation done - held 0 945488340000|||2630|||Question 29 feedback 945488340830|||830|||presented Question 30 id ’smSelect’

315 945488347970|||7140|||searched for man ii 945488347970|||0|||starting animation 945488349230|||1260|||animation done - held 0 945488350270|||1040|||clicked on |Superman II|, index 436, mark 0.63, reliability 0.788709, peers 485|411|483|488|479|403|

320 945488350270|||0|||starting animation G.1 Tasks Done Well 195

945488351540|||1270|||animation done - held 0 945488353350|||1810|||Question 30 feedback 945488353900|||550|||presented Question 31 id ’sbSelect’ 945488367960|||14060|||searched for spell

325 945488367960|||0|||starting animation 945488369170|||1210|||animation done - held 0 945488371420|||2250|||clicked on |Spellbound|, index 120, mark 0.76, reliability 0.658002, peers 113|119|485|169|135|132| 945488371420|||0|||starting animation

330 945488372630|||1210|||animation done - held 0 945488377630|||5000|||Question 31 feedback 945488378010|||380|||presented Question 32 id ’tochSelect’ 945488429480|||51470|||clicked on |Gentleman’s Agreement|, index 135, mark 0.72, reliability 0.49286,

335 peers 212|93|163|272|120|127| 945488429480|||0|||starting animation 945488430680|||1200|||animation done - held 0 945488433380|||2700|||Question 32 feedback Here we se some good use of the slider.

340 945488433650|||270|||presented Question 33 id ’rec80’ 945488441120|||7470|||slider to 51 945488442160|||1040|||slider to 80 945488443260|||1100|||slider to 81 945488447380|||4120|||clicked on |Best Years of Our Lives, The|,

345 index 125, mark 0.84, reliability 0.710092, peers 199|162|94|103|85|257| 945488447380|||0|||starting animation 945488448650|||1270|||animation done - held 0 945488451170|||2520|||Question 33 feedback

350 945488451560|||390|||presented Question 34 id ’highRec’ 945488465950|||14390|||clicked on |To Kill a Mockingbird|, index 264, mark 0.85, reliability 0.890636, peers 237|220|301|370|190|64| 945488465950|||0|||starting animation

355 945488467210|||1260|||animation done - held 0 945488471550|||4340|||slider to 82 945488472650|||1100|||slider to 89 945488473750|||1100|||slider to 90 945488474740|||990|||slider to 94

360 945488477430|||2690|||clicked on |Godfather Trilogy: 1901-1980, The|, index 488, mark 0.91, reliability 0.527496, peers 436|411|483|463|268|479| 945488477430|||0|||starting animation 196 Movie Experiment Logs

945488478690|||1260|||animation done - held 0

365 945488494290|||15600|||clicked on |Directed by William Wyler|, index 476, mark 0.95, reliability 0.261391, peers 47|250|54|167|196|209| 945488494290|||0|||starting animation 945488495550|||1260|||animation done - held 0

370 945488498680|||3130|||slider to 95 945488499780|||1100|||slider to 100 945488517630|||17850|||Question 34 feedback Once again, the following question shows good use of the slider. The number of selections before finding an answer

375 demonstrates the need most participants had to ‘hunt’ for an answer to questions that asked them to find titles of a certain recommendation. 945488518130|||500|||presented Question 35 id ’rec81’ 945488526370|||8240|||slider to 98

380 945488527630|||1260|||slider to 26 945488540150|||12520|||clicked on |Sicilian, The|, index 479, mark 0.51, reliability 0.499891, peers 436|483|488|493|316|496| 945488540150|||0|||starting animation 945488541360|||1210|||animation done - held 0

385 945488542290|||930|||clicked on |Kiss the Sky|, index 494, mark 0.35, reliability 0.182596, peers 481|492|480|174|490|490| 945488542290|||0|||starting animation 945488543560|||1270|||animation done - held 0 945488554490|||10930|||clicked on |Legend of the Werewolf|,

390 index 383, mark 0.33, reliability 0.261391, peers 374|224|204|366|143|317| 945488554490|||0|||starting animation 945488555750|||1260|||animation done - held 0 945488568330|||12580|||slider to 25

395 945488569430|||1100|||slider to 21 945488574210|||4780|||clicked on |Captain Sindbad|, index 275, mark 0.33, reliability 0.315005, peers 163|213|385|263|49|394| 945488574210|||0|||starting animation 945488575470|||1260|||animation done - held 0

400 945488576460|||990|||clicked on |Great Catherine|, index 322, mark 0.18, reliability 0.223917, peers 200|339|265|388|444|381| 945488576460|||0|||starting animation 945488577670|||1210|||animation done - held 0 945488583050|||5380|||Question 35 feedback

405 The following question required the selection of the movie with the lowest recommendation. This participant found the G.1 Tasks Done Well 197

movie (Gread Catherine), quickly, but then tried to find a better answer. When they realised that there was no better answer they used the search function to find

410 Great Catherine again. 945488583490|||440|||presented Question 36 id ’lowRec’ 945488593810|||10320|||slider to 20 945488594860|||1050|||slider to 17 945488598370|||3510|||slider to 16

415 945488599470|||1100|||slider to 11 945488605950|||6480|||clicked on |Monster Island|, index 443, mark 0.2, reliability 0.261391, peers 400|372|91|458|414|363| 945488605950|||0|||starting animation 945488607160|||1210|||animation done - held 0

420 945488613090|||5930|||back to 322 945488613090|||0|||set topic to |Great Catherine|, index 322, mark 0.18, reliability 0.223917, peers 200|339|265|388|444|381| 945488613090|||0|||starting animation 945488614360|||1270|||animation done - held 0

425 945488618260|||3900|||clicked on |"Masada"|, index 441, mark 0.5, reliability 0.0, peers 497|355|361|490|415|474| 945488618260|||0|||starting animation 945488619460|||1200|||animation done - held 0 945488620340|||880|||clicked on |Yesterday|, index 445, mark 0.8,

430 reliability 0.211914, peers 256|159|259|257|79|128| 945488620340|||0|||starting animation 945488621550|||1210|||animation done - held 0 945488627980|||6430|||clicked on |Rainbow Thief, The|, index 484, mark 0.63, reliability 0.268943, peers 283|363|314|230|338|323|

435 945488627980|||0|||starting animation 945488629190|||1210|||animation done - held 0 945488629840|||650|||clicked on |Conan the Destroyer|, index 465, mark 0.51, reliability 0.726457, peers 487|179|48|213|393|150| 945488629840|||0|||starting animation

440 945488631050|||1210|||animation done - held 0 945488632150|||1100|||clicked on |Hunter, The|, index 434, mark 0.58, reliability 0.432959, peers 364|247|306|337|416|412| 945488632150|||0|||starting animation 945488633360|||1210|||animation done - held 0

445 945488634510|||1150|||clicked on |Horsemen, The|, index 343, mark 0.45, reliability 0.300064, peers 325|401|366|326|63|207| 945488634510|||0|||starting animation 945488635780|||1270|||animation done - held 0 945488642040|||6260|||searched for cather 198 Movie Experiment Logs

450 945488642090|||50|||starting animation 945488643250|||1160|||animation done - held 0 945488645280|||2030|||clicked on nothing 945488646710|||1430|||clicked on |Great Catherine|, index 322, mark 0.18, reliability 0.223917, peers 200|339|265|388|444|381|

455 945488646710|||0|||starting animation 945488647920|||1210|||animation done - held 0 945488650610|||2690|||slider to 10 945488651650|||1040|||slider to 6 945488652750|||1100|||slider to 5

460 945488662090|||9340|||slider to 4 945488663180|||1090|||slider to 2 945488667470|||4290|||Question 36 feedback 945488667910|||440|||presented Question 37 id ’highRel’ 945488676700|||8790|||clicked on |Star Wars: Episode I -

465 The Phantom Menace|, index 498, mark 0.77, reliability 1.0, peers 491|85|170|492|334|174| 945488676700|||0|||starting animation 945488677900|||1200|||animation done - held 0 945488681420|||3520|||Question 37 feedback

470 The following question required the participant to select the most recommended movie with ‘man’ in the title. Here, they started out well, searching for ‘man’, and then using the slider. For some reason they selected a title that was not the most recommended, althought it was the most certain.

475 945488681750|||330|||presented Question 38 id ’manRel’ 945488688830|||7080|||searched for man 945488688890|||60|||starting animation 945488690100|||1210|||animation done - held 0 945488692240|||2140|||slider to 3

480 945488693280|||1040|||slider to 50 945488704710|||11430|||clicked on |Superman|, index 411, mark 0.68, reliability 0.840705, peers 485|436|268|488|403|208| 945488704710|||0|||starting animation 945488706140|||1430|||animation done - held 0

485 945488708770|||2630|||Question 38 feedback 945488709270|||500|||presented Question 39 id ’otherComments’ 945488813680|||104410|||Question 39 response It would be nice to set the slider’stolerance to be at the changingpoint between red and green ofthe

490 current movie. Thiswould allow one to see movieswhich are better or worse than the current fairly easily. G.2 An Average Session 199

945488813680|||0|||Question 39 feedback 945488814170|||490|||--rlog save--

G.2 An Average Session

944561932912.log

#log format v 2.1 944560935820|||50|||start session 944560935820|||0|||doAnimate to true 944560938520|||2700|||model is http://people.gmp.usyd.edu.au/hemul/

5 sq2p/gendata/movies300.rdf 944560946150|||7630|||model size 300 944560946150|||0|||maxPeers 6 min 6 average 6.0 The model is read, parsed and analyzed. 944560947080|||930|||set topic to |Directed by William Wyler|,

10 index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147| 944560947140|||60|||starting animation 944560947520|||380|||animation done - held 420 944560947520|||0|||changed to display 0 - What you would like

15 944560949060|||1540|||presented Question 0 id ’t-welcome’ 944560949500|||440|||resize to x=350, y=645 944560998930|||49430|||Question 0 feedback 944560999100|||170|||presented Question 1 id ’age’ 944561006460|||7360|||Question 1 response 19

20 944561006460|||0|||Question 1 feedback 944561006570|||110|||presented Question 2 id ’’ 944561017440|||10870|||Question 2 feedback 944561017500|||60|||presented Question 3 id ’’ 944561023590|||6090|||Question 3 feedback

25 944561023700|||110|||--- resetting --- 944561023700|||0|||changed to display 0 - What you would like 944561024640|||940|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147|

30 944561024640|||0|||--- reset --- 944561024750|||110|||presented Question 4 id ’’ 944561036170|||11420|||Question 4 feedback 944561036280|||110|||presented Question 5 id ’’ 944561070230|||33950|||Question 5 feedback

35 944561070340|||110|||--- resetting --- 200 Movie Experiment Logs

944561070340|||0|||changed to display 0 - What you would like 944561071330|||990|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147|

40 944561071330|||0|||--- reset --- Here is the first real question. It is answered correctly. 944561071380|||50|||presented Question 6 id ’sabrina’ 944561080550|||9170|||clicked on |Sabrina|, index 135, mark 0.77, reliability 0.713675, peers 34|289|151|73|134|150|

45 944561080550|||0|||starting animation 944561081380|||830|||animation done - held 180 944561116140|||34760|||Question 6 feedback 944561116250|||110|||presented Question 7 id ’’ 944561155800|||39550|||Question 7 feedback

50 The second real question, again answered correctly. 944561155910|||110|||presented Question 8 id ’notlike1’ 944561223580|||67670|||clicked on |Great Catherine|, index 211, mark 0.18, reliability 0.223917, peers 139|215|183|254|282|245| 944561223580|||0|||starting animation

55 944561224400|||820|||animation done - held 170 944561228470|||4070|||Question 8 feedback And the participant can identify the mark correctly. 944561228630|||160|||presented Question 9 id ’trecommendRadio’ 944561239230|||10600|||Question 9 response 18

60 944561239230|||0|||Question 9 feedback 944561239340|||110|||presented Question 10 id ’’ 944561270270|||30930|||Question 10 feedback And the following is again correct. 944561270380|||110|||presented Question 11 id ’tunc’

65 944561277950|||7570|||clicked on |Your Ticket Is No Longer Valid|, index 282, mark 0.5, reliability 0.0, peers 211|201|108|48|113|122| 944561277950|||0|||starting animation 944561278830|||880|||animation done - held 80

70 944561280210|||1380|||Question 11 feedback And correct again. 944561280320|||110|||presented Question 12 id ’tunc0’ 944561308380|||28060|||clicked on |Columbo: Bye-Bye Sky-High I.Q. Murder Case, The|, index 257, mark 0.82, reliability 0.261391,

75 peers 264|134|200|169|114|297| 944561308380|||0|||starting animation 944561309150|||770|||animation done - held 120 944561310310|||1160|||Question 12 feedback G.2 An Average Session 201

Correct

80 944561310420|||110|||presented Question 13 id ’t--certain-like-0’ 944561324860|||14440|||clicked on |Star Wars: Episode I - The Phantom Menace|, index 296, mark 0.77, reliability 1.0, peers 298|54|293|32|245|136| 944561324860|||0|||starting animation

85 944561325740|||880|||animation done - held 240 944561328270|||2530|||Question 13 feedback The following question was going well. The search for ’day’ was completed, but the participant failed to select anything once the search was completed.

90 944561328380|||110|||presented Question 14 id ’t-harddaysearch---0’ 944561348370|||19990|||clicked on nothing 944561353970|||5600|||searched for day 944561354030|||60|||starting animation 944561354850|||820|||animation done - held 180

95 944561358810|||3960|||Question 14 feedback The following question was answered incorrectly. I cannot ascertaion as to the reason. 944561358910|||100|||presented Question 15 id ’t-easy-certain-like-0’ 944561389620|||30710|||clicked on |Sabrina|, index 135, mark 0.77,

100 reliability 0.713675, peers 34|289|151|73|134|150| 944561389620|||0|||starting animation 944561390500|||880|||animation done - held 220 944561393460|||2960|||Question 15 feedback 944561393570|||110|||presented Question 16 id ’’

105 This selection was not neccessary. Perhaps playing? 944561409340|||15770|||clicked on |To Dorothy a Son|, index 133, mark 0.5, reliability 0.0, peers 223|120|43|225|60|98| 944561409340|||0|||starting animation 944561410220|||880|||animation done - held 120

110 944561412580|||2360|||clicked on |Million Pound Note, The|, index 129, mark 0.69, reliability 0.370702, peers 226|24|294|258|249|21| 944561412630|||50|||starting animation 944561413400|||770|||animation done - held 160

115 944561419610|||6210|||Question 16 feedback 944561419720|||110|||presented Question 17 id ’’ Now playing with the slider. 944561446250|||26530|||slider to 51 944561447400|||1150|||slider to 82

120 944561448500|||1100|||slider to 81 944561451130|||2630|||slider to 82 202 Movie Experiment Logs

944561452230|||1100|||slider to 100 944561461300|||9070|||Question 17 feedback 944561461410|||110|||presented Question 18 id ’t-spartacu-0’

125 944561473540|||12130|||searched for Spartacus 944561473540|||0|||starting animation 944561474260|||720|||animation done - held 120 Participant managed search and select this time. 944561481510|||7250|||clicked on |Spartacus|, index 169, mark 0.79,

130 reliability 0.804729, peers 102|151|77|257|188|283| 944561481510|||0|||starting animation 944561482280|||770|||animation done - held 120 944561485460|||3180|||slider to 99 944561486670|||1210|||slider to 85

135 944561488810|||2140|||slider to 84 944561502760|||13950|||Question 18 feedback 944561502820|||60|||presented Question 19 id ’doneBefore’ 944561531660|||28840|||Question 19 response -1 944561531660|||0|||Question 19 feedback

140 944561531710|||50|||presented Question 20 id ’’ 944561538800|||7090|||Question 20 feedback 944561538850|||50|||--- moving to new data set --- 944561538850|||0|||model is http://people.gmp.usyd.edu.au/hemul/ sq2p/gendata/movies500.rdf

145 944561542420|||3570|||model size 500 944561542420|||0|||maxPeers 6 min 4 average 5.996 944561543350|||930|||set topic to |Directed by William Wyler|, index 476, mark 0.95, reliability 0.261391, peers 47|250|54|167|196|209|

150 944561543410|||60|||starting animation 944561544620|||1210|||animation done - held 0 944561544620|||0|||changed to display 0 - What you would like 944561545660|||1040|||presented Question 21 id ’’ 944561555710|||10050|||Question 21 feedback

155 944561555820|||110|||presented Question 22 id ’chourSelect’ 944561568790|||12970|||searched for Children 944561568790|||0|||starting animation 944561569830|||1040|||animation done - held 0 Right answer, but used a search when it should not have

160 been neccessary. This would have slowed the answer down. 944561571200|||1370|||clicked on |Children’s Hour, The|, index 250, mark 0.81, reliability 0.499133, peers 180|47|306|61|476|106| 944561571200|||0|||starting animation 944561572250|||1050|||animation done - held 0 G.2 An Average Session 203

165 944561572740|||490|||Question 22 feedback 944561572850|||110|||presented Question 23 id ’carrieSelect’ Correct again. 944561581310|||8460|||clicked on |Carrie|, index 180, mark 0.7, reliability 0.332029, peers 187|47|250|373|384|376|

170 944561581360|||50|||starting animation 944561582410|||1050|||animation done - held 0 944561584110|||1700|||Question 23 feedback The next question wanted the participant to select a movie ‘similar’ to the previous selection, with a particular

175 recommendation. This participant selects another movie, this making it hard to do this comparison. 944561584160|||50|||presented Question 24 id ’whiffsSelect’ 944561600920|||16760|||slider to 49 944561602130|||1210|||slider to 65

180 944561618820|||16690|||slider to 66 944561620030|||1210|||slider to 78 944561644030|||24000|||clicked on |Smash-Up, the Story of a Woman|, index 131, mark 0.81, reliability 0.244367, peers 100|104|153|201|426|442|

185 944561644030|||0|||starting animation 944561645080|||1050|||animation done - held 0 944561647440|||2360|||clicked on |Something in the Wind|, index 133, mark 0.47, reliability 0.164016, peers 137|58|69|162|114|118|

190 944561647440|||0|||starting animation 944561648650|||1210|||animation done - held 0 944561653480|||4830|||clicked on |Paradine Case, The|, index 132, mark 0.62, reliability 0.482661, peers 119|120|195|216|144|245| 944561653480|||0|||starting animation

195 944561654520|||1040|||animation done - held 0 944561665840|||11320|||searched for Carrie 944561665890|||50|||starting animation 944561666940|||1050|||animation done - held 0 The re-selection of Carrie so that there can be the comparison.

200 944561667600|||660|||clicked on |Carrie|, index 180, mark 0.7, reliability 0.332029, peers 187|47|250|373|384|376| 944561667650|||50|||starting animation 944561668690|||1040|||animation done - held 0 944561671770|||3080|||slider to 79

205 944561672920|||1150|||slider to 86 944561675400|||2480|||slider to 85 944561676660|||1260|||slider to 84 204 Movie Experiment Logs

944561677870|||1210|||slider to 23 944561679020|||1150|||slider to 0

210 944561680230|||1210|||slider to 1 944561681440|||1210|||slider to 2 944561682650|||1210|||slider to 37 Selected something with almost the right recommendation, but it’s not a sibling of ‘Carrie’. Wrong.

215 944561719890|||37240|||clicked on |Something in the Wind|, index 133, mark 0.47, reliability 0.164016, peers 137|58|69|162|114|118| 944561719890|||0|||starting animation 944561721090|||1200|||animation done - held 0

220 944561721970|||880|||Question 24 feedback 944561722080|||110|||presented Question 25 id ’lowReliab’ Correct. 944561728780|||6700|||clicked on |What Next, Corporal Hargrove?|, index 118, mark 0.5, reliability 0.0,

225 peers 121|133|149|69|155|164| 944561728780|||0|||starting animation 944561729830|||1050|||animation done - held 0 944561730820|||990|||Question 25 feedback 944561730930|||110|||presented Question 26 id ’highReliab’

230 Correct. 944561746190|||15260|||clicked on |Oh Dad, Poor Dad, Mama’s Hung You in the Closet and I’m Feeling So Sad|, index 313, mark 0.47, reliability 0.294555, peers 289|85|170|287|72|415| 944561746190|||0|||starting animation

235 944561747240|||1050|||animation done - held 0 944561771790|||24550|||Question 26 feedback The slidder was a little "sticky" so this took me a while just because I over-slid it. Is this a comment about the previous questions? There

240 was no slider used in this question. 944561771900|||110|||presented Question 27 id ’papillonSelect’ 944561782500|||10600|||searched for Papi 944561782500|||0|||starting animation 944561783710|||1210|||animation done - held 0

245 944561784480|||770|||clicked on |Papillon|, index 364, mark 0.76, reliability 0.773986, peers 345|247|325|434|57|56| 944561784480|||0|||starting animation 944561785580|||1100|||animation done - held 0 944561787280|||1700|||Question 27 feedback

250 944561787390|||110|||presented Question 28 id ’rhSelect’ G.2 An Average Session 205

944561797060|||9670|||searched for Roman 944561797060|||0|||starting animation 944561798150|||1090|||animation done - held 0 944561800350|||2200|||clicked on |Roman Holiday|, index 187,

255 mark 0.8, reliability 0.764539, peers 236|276|345|8|180|233| 944561800350|||0|||starting animation 944561801450|||1100|||animation done - held 0 944561802330|||880|||Question 28 feedback 944561802440|||110|||presented Question 29 id ’anotherCert’

260 944561811500|||9060|||searched for it 944561811500|||0|||starting animation 944561812600|||1100|||animation done - held 0 944561830830|||18230|||clicked on |Witness for the Prosecution|, index 216, mark 0.83, reliability 0.670908,

265 peers 218|132|196|222|245|149| 944561830830|||0|||starting animation 944561831930|||1100|||animation done - held 0 944561832650|||720|||Question 29 feedback 944561832760|||110|||presented Question 30 id ’smSelect’

270 944561840720|||7960|||searched for Superman 944561840720|||0|||starting animation 944561841930|||1210|||animation done - held 0 944561842920|||990|||clicked on nothing 944561848850|||5930|||clicked on |Superman II|, index 436,

275 mark 0.63, reliability 0.788709, peers 485|411|483|488|479|403| 944561848850|||0|||starting animation 944561849890|||1040|||animation done - held 0 944561851160|||1270|||Question 30 feedback 944561851210|||50|||presented Question 31 id ’sbSelect’

280 944561859620|||8410|||searched for Spell 944561859620|||0|||starting animation 944561860660|||1040|||animation done - held 0 944561862640|||1980|||clicked on |Spellbound|, index 120, mark 0.76, reliability 0.658002, peers 113|119|485|169|135|132|

285 944561862640|||0|||starting animation 944561863850|||1210|||animation done - held 0 944561864670|||820|||Question 31 feedback 944561864780|||110|||presented Question 32 id ’tochSelect’ We were after a related movie, but this isn’t. wrong.

290 944561877850|||13070|||clicked on |Best Years of Our Lives, The|, index 125, mark 0.84, reliability 0.710092, peers 199|162|94|103|85|257| 944561877850|||0|||starting animation 206 Movie Experiment Logs

944561878890|||1040|||animation done - held 0

295 944561880050|||1160|||Question 32 feedback 944561880100|||50|||presented Question 33 id ’rec80’ 944561890870|||10770|||clicked on |Best Years of Our Lives, The|, index 125, mark 0.84, reliability 0.710092, peers 199|162|94|103|85|257|

300 944561893340|||2470|||clicked on |Best Years of Our Lives, The|, index 125, mark 0.84, reliability 0.710092, peers 199|162|94|103|85|257| 944561896470|||3130|||clicked on |Best Years of Our Lives, The|, index 125, mark 0.84, reliability 0.710092,

305 peers 199|162|94|103|85|257| Correct response. I’ve no idea why the three selections. 944561898230|||1760|||Question 33 feedback 944561898280|||50|||presented Question 34 id ’highRec’ 944561904380|||6100|||slider to 40

310 944561905590|||1210|||slider to 100 944561942720|||37130|||slider to 99 944561943930|||1210|||slider to 92 944561945130|||1200|||slider to 91 944561946400|||1270|||slider to 90

315 944561947610|||1210|||slider to 89 944561948810|||1200|||slider to 88 944561950020|||1210|||slider to 87 944561951070|||1050|||slider to 88 944561952710|||1640|||slider to 89

320 944561953920|||1210|||slider to 79 944561955130|||1210|||slider to 81 944561956390|||1260|||slider to 80 944561960070|||3680|||clicked on |Way to the Stars, The|, index 122, mark 0.84, reliability 0.300064,

325 peers 288|72|95|184|186|165| 944561960070|||0|||starting animation 944561961340|||1270|||animation done - held 0 944561983470|||22130|||clicked on |Godfather Trilogy: 1901-1980, The|, index 488, mark 0.91, reliability 0.527496,

330 peers 436|411|483|463|268|479| 944561983470|||0|||starting animation 944561984520|||1050|||animation done - held 0 944561987700|||3180|||clicked on |Directed by William Wyler|, index 476, mark 0.95, reliability 0.261391,

335 peers 47|250|54|167|196|209| 944561987760|||60|||starting animation G.2 An Average Session 207

944561988800|||1040|||animation done - held 0 Correct. Again, a double selection for some reason. 944561990280|||1480|||clicked on |Directed by William Wyler|,

340 index 476, mark 0.95, reliability 0.261391, peers 47|250|54|167|196|209| 944561992370|||2090|||Question 34 feedback 944561992480|||110|||presented Question 35 id ’rec81’ 944562001870|||9390|||slider to 75

345 944562003030|||1160|||slider to 10 944562011760|||8730|||slider to 11 944562012970|||1210|||slider to 26 944562015490|||2520|||slider to 27 944562023180|||7690|||clicked on |Grande attacco, Il|, index 407,

350 mark 0.4, reliability 0.164016, peers 412|296|401|395|410|398| 944562023180|||0|||starting animation 944562024280|||1100|||animation done - held 0 944562029830|||5550|||clicked on |Tender Comrade|, index 110, mark 0.62, reliability 0.319535, peers 80|116|104|362|83|293|

355 944562029830|||0|||starting animation 944562031040|||1210|||animation done - held 0 We’re starting to see ‘hunting’ for an answer. 944562032740|||1700|||clicked on |Remarkable Andrew, The|, index 104, mark 0.63, reliability 0.211914,

360 peers 80|116|196|131|413|110| 944562032740|||0|||starting animation 944562033840|||1100|||animation done - held 0 944562034550|||710|||clicked on |Eagle Squadron|, index 101, mark 0.3, reliability 0.164016, peers 173|451|456|462|469|468|

365 944562034550|||0|||starting animation 944562035600|||1050|||animation done - held 0 Correct. 944562039000|||3400|||clicked on |Stitches|, index 469, mark 0.28, reliability 0.300064, peers 108|101|117|183|280|142|

370 944562039000|||0|||starting animation 944562040210|||1210|||animation done - held 0 944562041200|||990|||Question 35 feedback 944562041250|||50|||presented Question 36 id ’lowRec’ 944562049820|||8570|||slider to 26

375 944562051030|||1210|||slider to 24 944562052180|||1150|||slider to 19 944562054270|||2090|||clicked on |"Masada"|, index 441, mark 0.5, reliability 0.0, peers 497|355|361|490|415|474| 944562054330|||60|||starting animation 208 Movie Experiment Logs

380 944562055370|||1040|||animation done - held 0 944562055750|||380|||clicked on |Goliath Awaits|, index 448, mark 0.55, reliability 0.365192, peers 62|406|418|423|433|432| 944562055810|||60|||starting animation 944562056850|||1040|||animation done - held 0

385 944562057510|||660|||clicked on |"Masada"|, index 441, mark 0.5, reliability 0.0, peers 497|355|361|490|415|474| 944562057510|||0|||starting animation 944562058780|||1270|||animation done - held 0 944562060040|||1260|||clicked on |Monster Island|, index 443,

390 mark 0.2, reliability 0.261391, peers 400|372|91|458|414|363| 944562060040|||0|||starting animation 944562061080|||1040|||animation done - held 0 944562068990|||7910|||slider to 18 944562070200|||1210|||slider to 14

395 944562071410|||1210|||slider to 13 More hunting, and an incorrect (although close) answer. 944562074760|||3350|||clicked on |Monster Island|, index 443, mark 0.2, reliability 0.261391, peers 400|372|91|458|414|363| 944562076300|||1540|||Question 36 feedback

400 944562076410|||110|||presented Question 37 id ’highRel’ 944562085080|||8670|||clicked on |Star Wars: Episode I - The Phantom Menace|, index 498, mark 0.77, reliability 1.0, peers 491|85|170|492|334|174| 944562085080|||0|||starting animation

405 944562086290|||1210|||animation done - held 0 944562087780|||1490|||Question 37 feedback 944562087830|||50|||presented Question 38 id ’manRel’ 944562095910|||8080|||searched for man 944562095960|||50|||starting animation

410 944562097000|||1040|||animation done - held 0 944562104860|||7860|||slider to 14 944562106070|||1210|||slider to 100 944562107270|||1200|||slider to 84 944562108480|||1210|||slider to 83

415 944562145940|||37460|||slider to 82 944562147100|||1160|||slider to 71 944562156160|||9060|||clicked on |Smash-Up, the Story of a Woman|, index 131, mark 0.81, reliability 0.244367, peers 100|104|153|201|426|442|

420 Correct answer. The search+scan pattern was a good solution. The use of the slider is possibly a waste of time. 944562156210|||50|||starting animation G.3 A Poor Session 209

944562157260|||1050|||animation done - held 0 944562159180|||1920|||Question 38 feedback

425 944562159230|||50|||presented Question 39 id ’otherComments’ 944562261230|||102000|||Question 39 response I would have liked to be able to zoom in without acutally selecting a new movie, I guess this may be because I have less than perfect eye-sight, but sometimes it would have been nice to be able

430 to look either side of something before selecting it and causing a shift in all the titles. However, despite this, it was really impressively easy to find things and see what would be recommended.... 944562261230|||0|||Question 39 feedback

435 944562261340|||110|||--rlog save--

G.3 A Poor Session

945391426744.log

#log format v 2.1 945483335630|||50|||start session 945483335630|||0|||doAnimate to true 945483338380|||2750|||model is http://People.gmp.usyd.edu.au/hemul/

5 sq2p/gendata/movies300.rdf 945483349750|||11370|||model size 300 945483349750|||0|||maxPeers 6 min 6 average 6.0 945483350570|||820|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391,

10 peers 33|171|34|121|135|147| 945483350680|||110|||starting animation 945483351120|||440|||animation done - held 440 945483351120|||0|||changed to display 0 - What you would like 945483352600|||1480|||presented Question 0 id ’t-welcome’

15 945483353100|||500|||resize to x=350, y=645 945483416040|||62940|||Question 0 feedback 945483416590|||550|||presented Question 1 id ’age’ 945483423400|||6810|||Question 1 response 52 945483423400|||0|||Question 1 feedback

20 945483423900|||500|||presented Question 2 id ’’ 945483432460|||8560|||Question 2 feedback 945483432790|||330|||presented Question 3 id ’’ 945483436470|||3680|||Question 3 feedback 945483436860|||390|||--- resetting --- 210 Movie Experiment Logs

25 945483436860|||0|||changed to display 0 - What you would like 945483437680|||820|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147| 945483437680|||0|||--- reset ---

30 945483437790|||110|||presented Question 4 id ’’ 945483446800|||9010|||Question 4 feedback 945483447180|||380|||presented Question 5 id ’’ 945483468390|||21210|||Question 5 feedback 945483468720|||330|||--- resetting ---

35 945483468720|||0|||changed to display 0 - What you would like 945483469540|||820|||set topic to |Directed by William Wyler|, index 289, mark 0.95, reliability 0.261391, peers 33|171|34|121|135|147| 945483469590|||50|||--- reset ---

40 945483469700|||110|||presented Question 6 id ’sabrina’ Correct. 945483473770|||4070|||clicked on |Sabrina|, index 135, mark 0.77, reliability 0.713675, peers 34|289|151|73|134|150| 945483473770|||0|||starting animation

45 945483474700|||930|||animation done - held 80 945483496670|||21970|||Question 6 feedback 945483497060|||390|||presented Question 7 id ’’ 945483520400|||23340|||Question 7 feedback 945483520730|||330|||presented Question 8 id ’notlike1’

50 945483531330|||10600|||clicked on |Love Lottery, The|, index 134, mark 0.5, reliability 0.0, peers 34|127|192|135|257|113| 945483531390|||60|||starting animation 945483532320|||930|||animation done - held 40 945483537650|||5330|||clicked on |Gone to Earth|, index 114,

55 mark 0.53, reliability 0.261391, peers 148|95|290|261|206|257| 945483537650|||0|||starting animation 945483538580|||930|||animation done - held 80 945483544350|||5770|||clicked on |So Evil My Love|, index 104, mark 0.73, reliability 0.198306, peers 102|194|62|54|292|108|

60 945483544350|||0|||starting animation 945483545280|||930|||animation done - held 80 The participant is having to hunt. 945483554070|||8790|||clicked on |Ridin’ for Love|, index 2, mark 0.5, reliability 0.0, peers 130|0|3|4|5|6|

65 945483554070|||0|||starting animation 945483554950|||880|||animation done - held 40 But got it correct in the end. G.3 A Poor Session 211

945483561590|||6640|||clicked on |Whiffs|, index 248, mark 0.34, reliability 0.211914, peers 190|126|142|242|250|244|

70 945483561590|||0|||starting animation 945483562530|||940|||animation done - held 80 945483567090|||4560|||Question 8 feedback 945483567580|||490|||presented Question 9 id ’trecommendRadio’ 945483577360|||9780|||Question 9 response 34

75 945483577360|||0|||Question 9 feedback 945483577630|||270|||presented Question 10 id ’’ 945483613500|||35870|||Question 10 feedback 945483613880|||380|||presented Question 11 id ’tunc’ 945483618060|||4180|||clicked on |Girl Rush, The|, index 142,

80 mark 0.5, reliability 0.0, peers 190|242|248|250|244|251| 945483618060|||0|||starting animation 945483618940|||880|||animation done - held 40 945483630420|||11480|||Question 11 feedback 945483630750|||330|||presented Question 12 id ’tunc 0’

85 This is almost correct. We were after a mark >= 0.7. 945483773500|||142750|||clicked on |Another Time, Another Place|, index 157, mark 0.66, reliability 0.182596, peers 78|17|27|246|160|75| 945483773500|||0|||starting animation

90 945483774430|||930|||animation done - held 80 945483779540|||5110|||Question 12 feedback 945483779920|||380|||presented Question 13 id ’t--certain-like-0’ 945483787060|||7140|||clicked on |Breakfast at Tiffany’s|, index 176, mark 0.77, reliability 0.810693,

95 peers 256|175|209|206|293|36| 945483787060|||0|||starting animation 945483788050|||990|||animation done - held 0 945483789700|||1650|||Question 13 feedback 945483790030|||330|||presented Question 14 id ’t-harddaysearch---0’

100 945483817160|||27130|||searched for day 945483817220|||60|||starting animation 945483818100|||880|||animation done - held 80 945483820460|||2360|||clicked on |Roman Holiday|, index 130, mark 0.8, reliability 0.764539, peers 162|190|224|2|126|161|

105 945483820460|||0|||starting animation 945483821450|||990|||animation done - held 80 945483823040|||1590|||Question 14 feedback 945483823590|||550|||presented Question 15 id ’t-easy-certain-like-0’ 945483834030|||10440|||clicked on |Big Country, The|, index 161,

110 mark 0.77, reliability 0.572028, peers 130|146|298|83|86|91| 212 Movie Experiment Logs

945483834030|||0|||starting animation 945483834960|||930|||animation done - held 80 945483836880|||1920|||Question 15 feedback 945483837760|||880|||presented Question 16 id ’’

115 945483846110|||8350|||clicked on |Web, The|, index 96, mark 0.76, reliability 0.282551, peers 86|221|61|117|101|217| 945483846110|||0|||starting animation 945483847040|||930|||animation done - held 80 945483848750|||1710|||clicked on |Keys of the Kingdom, The|,

120 index 86, mark 0.75, reliability 0.332029, peers 161|83|137|148|66|96| 945483848750|||0|||starting animation 945483849680|||930|||animation done - held 80 945483852750|||3070|||Question 16 feedback

125 945483853140|||390|||presented Question 17 id ’’ 945483870440|||17300|||slider to 51 945483871480|||1040|||slider to 100 945483879390|||7910|||clicked on |Thieves Fall Out|, index 63, mark 0.5, reliability 0.0, peers 64|65|292|246|97|180|

130 945483879390|||0|||starting animation 945483880270|||880|||animation done - held 40 945483883180|||2910|||clicked on |Lone Star, The|, index 18, mark 0.5, reliability 0.0, peers 9|20|198|243|197|75| 945483883180|||0|||starting animation

135 945483884120|||940|||animation done - held 80 945483904110|||19990|||Question 17 feedback 945483904660|||550|||presented Question 18 id ’t-spartacu-0’ 945483914710|||10050|||searched for sparta 945483914710|||0|||starting animation

140 945483915640|||930|||animation done - held 80 945483916850|||1210|||clicked on |Spartacus|, index 169, mark 0.79, reliability 0.804729, peers 102|151|77|257|188|283| 945483916910|||60|||starting animation 945483917790|||880|||animation done - held 80

145 945483925420|||7630|||slider to 98 945483926460|||1040|||slider to 7 945483928880|||2420|||slider to 8 945483930030|||1150|||slider to 100 945483932230|||2200|||slider to 98

150 945483933220|||990|||slider to 70 945483935200|||1980|||slider to 71 945483936190|||990|||slider to 100 945483943000|||6810|||Question 18 feedback G.3 A Poor Session 213

945483943490|||490|||presented Question 19 id ’doneBefore’

155 945483952720|||9230|||Question 19 response 0 945483952720|||0|||Question 19 feedback 945483953210|||490|||presented Question 20 id ’’ 945483959090|||5880|||Question 20 feedback 945483959480|||390|||--- moving to new data set ---

160 945483959480|||0|||model is http://People.gmp.usyd.edu.au/hemul/ sq2p/gendata/movies500.rdf 945483971940|||12460|||model size 500 945483971940|||0|||maxPeers 6 min 4 average 5.996 945483972880|||940|||set topic to |Directed by William Wyler|,

165 index 476, mark 0.95, reliability 0.261391, peers 47|250|54|167|196|209| 945483972880|||0|||starting animation 945483974250|||1370|||animation done - held 0 945483974250|||0|||changed to display 0 - What you would like

170 945483975130|||880|||presented Question 21 id ’’ 945483980840|||5710|||Question 21 feedback 945483981280|||440|||presented Question 22 id ’chourSelect’ This search should not be neccessary. 945483991550|||10270|||searched for children

175 945483991610|||60|||starting animation 945483992760|||1150|||animation done - held 0 945483993360|||600|||clicked on |Children’s Hour, The|, index 250, mark 0.81, reliability 0.499133, peers 180|47|306|61|476|106| 945483993360|||0|||starting animation

180 945483994570|||1210|||animation done - held 0 945483995560|||990|||Question 22 feedback 945483996000|||440|||presented Question 23 id ’carrieSelect’ 945484003310|||7310|||searched for carrie Nor should this one.

185 945484003310|||0|||starting animation 945484004680|||1370|||animation done - held 0 945484005060|||380|||clicked on nothing There was nothing selected after the search. A click was detected, but it was not on any topic. Did they

190 try and click the right answer and miss? 945484008470|||3410|||Question 23 feedback 945484008910|||440|||presented Question 24 id ’whiffsSelect’ Now they select it! 945484014890|||5980|||clicked on |Carrie|, index 180, mark 0.7,

195 reliability 0.332029, peers 187|47|250|373|384|376| 945484014890|||0|||starting animation 214 Movie Experiment Logs

945484016100|||1210|||animation done - held 0 945484019070|||2970|||slider to 51 945484020110|||1040|||slider to 98

200 there was some attempt at using the slider, but no final selection. 945484033190|||13080|||Question 24 feedback 945484033510|||320|||presented Question 25 id ’lowReliab’ 945484044330|||10820|||clicked on |Strip-tease|, index 392,

205 mark 0.5, reliability 0.0, peers 255|164|269|460|262|102| 945484044330|||0|||starting animation 945484045710|||1380|||animation done - held 0 945484053780|||8070|||Question 25 feedback 945484054330|||550|||presented Question 26 id ’highReliab’

210 945484108100|||53770|||slider to 88 945484109150|||1050|||slider to 3 945484133420|||24270|||clicked on |"Man from U.N.C.L.E., The"|, index 291, mark 0.5, reliability 0.0, peers 144|228|278|279|333|300|

215 945484133420|||0|||starting animation 945484134690|||1270|||animation done - held 0 945484136500|||1810|||clicked on |How to Save a Marriage (And Ruin Your Life)|, index 318, mark 0.5, reliability 0.0, peers 463|147|474|41|50|342|

220 945484136500|||0|||starting animation 945484137870|||1370|||animation done - held 0 945484151110|||13240|||clicked on |Haunted Homestead, The|, index 10, mark 0.5, reliability 0.0, peers 29|13|19|26|103|20| 945484151110|||0|||starting animation

225 945484152320|||1210|||animation done - held 0 945484165720|||13400|||clicked on |Evergreen|, index 39, mark 0.49, reliability 0.198306, peers 148|457|252|277|409|16| 945484165720|||0|||starting animation 945484166930|||1210|||animation done - held 0

230 945484174780|||7850|||clicked on |Evergreen|, index 39, mark 0.49, reliability 0.198306, peers 148|457|252|277|409|16| Correct, but had to hunt. 945484176870|||2090|||Question 26 feedback 945484177360|||490|||presented Question 27 id ’papillonSelect’

235 945484185820|||8460|||searched for papillon 945484185820|||0|||starting animation 945484187200|||1380|||animation done - held 0 945484188020|||820|||clicked on nothing 945484191480|||3460|||clicked on |Papillon|, index 364, mark 0.76, G.3 A Poor Session 215

240 reliability 0.773986, peers 345|247|325|434|57|56| 945484191480|||0|||starting animation 945484192740|||1260|||animation done - held 0 945484193680|||940|||Question 27 feedback 945484194170|||490|||presented Question 28 id ’rhSelect’

245 945484200710|||6540|||searched for roman ho The title should have been exposed. Search should not be needed. 945484200710|||0|||starting animation 945484201920|||1210|||animation done - held 0

250 945484202790|||870|||clicked on |Roman Holiday|, index 187, mark 0.8, reliability 0.764539, peers 236|276|345|8|180|233| 945484202850|||60|||starting animation 945484204170|||1320|||animation done - held 0 945484204660|||490|||Question 28 feedback

255 945484205210|||550|||presented Question 29 id ’anotherCert’ 945484215370|||10160|||searched for it 945484215430|||60|||starting animation 945484216640|||1210|||animation done - held 0 945484232620|||15980|||clicked on |Witness for the Prosecution|,

260 index 216, mark 0.83, reliability 0.670908, peers 218|132|196|222|245|149| 945484232620|||0|||starting animation 945484233830|||1210|||animation done - held 0 945484234820|||990|||Question 29 feedback

265 945484235150|||330|||presented Question 30 id ’smSelect’ 945484243820|||8670|||searched for superman iI 945484243820|||0|||starting animation 945484245200|||1380|||animation done - held 0 945484246080|||880|||clicked on |Superman II|, index 436,

270 mark 0.63, reliability 0.788709, peers 485|411|483|488|479|403| 945484246080|||0|||starting animation 945484247340|||1260|||animation done - held 0 945484247780|||440|||Question 30 feedback 945484248110|||330|||presented Question 31 id ’sbSelect’

275 945484255850|||7740|||searched for spellb 945484255910|||60|||starting animation 945484257060|||1150|||animation done - held 0 945484257720|||660|||clicked on |Spellbound|, index 120, mark 0.76, reliability 0.658002, peers 113|119|485|169|135|132|

280 945484257720|||0|||starting animation 945484258980|||1260|||animation done - held 0 945484259590|||610|||Question 31 feedback 216 Movie Experiment Logs

945484260080|||490|||presented Question 32 id ’tochSelect’ 945484270520|||10440|||slider to 4

285 945484271560|||1040|||slider to 100 945484310780|||39220|||clicked on |Spellbound|, index 120, mark 0.76, reliability 0.658002, peers 113|119|485|169|135|132| 945484312320|||1540|||Question 32 feedback #COMMENT|||just clicked on spellbound again. wierd.

290 945484312650|||330|||presented Question 33 id ’rec80’ 945484332030|||19380|||clicked on |You’re in the Navy Now|, index 170, mark 0.72, reliability 0.234654, peers 298|313|287|498|38|36| 945484332030|||0|||starting animation

295 945484333300|||1270|||animation done - held 0 945484364610|||31310|||clicked on |Brother Rat|, index 59, mark 0.64, reliability 0.356325, peers 76|91|298|287|225|405| 945484364610|||0|||starting animation 945484365980|||1370|||animation done - held 0

300 945484372300|||6320|||clicked on |These Three|, index 47, mark 0.82, reliability 0.37593, peers 180|54|250|476|5|3| 945484372300|||0|||starting animation 945484373500|||1200|||animation done - held 0 945484384710|||11210|||Question 33 feedback

305 945484385090|||380|||presented Question 34 id ’highRec’ 945484420350|||35260|||clicked on |Rosebud|, index 380, mark 0.53, reliability 0.223917, peers 205|244|147|107|409|397| 945484420350|||0|||starting animation 945484421620|||1270|||animation done - held 0

310 945484450510|||28890|||clicked on |Evergreen|, index 39, mark 0.49, reliability 0.198306, peers 148|457|252|277|409|16| 945484450560|||50|||starting animation 945484451770|||1210|||animation done - held 0 945484455290|||3520|||clicked on ||,

315 index 38, mark 0.67, reliability 0.253234, peers 299|372|397|170|145|97| 945484455290|||0|||starting animation 945484456550|||1260|||animation done - held 0 945484489070|||32520|||slider to 98

320 945484490330|||1260|||slider to 0 945484509060|||18730|||slider to 1 945484510100|||1040|||slider to 100 945484513950|||3850|||clicked on |Aunt Sally|, index 37, mark 0.5, reliability 0.0, peers 71|40|42|97|41|338|

325 945484513950|||0|||starting animation G.3 A Poor Session 217

945484515210|||1260|||animation done - held 0 945484552670|||37460|||slider to 99 945484553770|||1100|||slider to 92 945484555030|||1260|||slider to 93

330 945484558990|||3960|||clicked on |Godfather Trilogy: 1901-1980, The|, index 488, mark 0.91, reliability 0.527496, peers 436|411|483|463|268|479| Close. 945484559040|||50|||starting animation

335 945484560250|||1210|||animation done - held 0 945484563550|||3300|||Question 34 feedback 945484563990|||440|||presented Question 35 id ’rec81’ 945484571840|||7850|||slider to 92 945484572940|||1100|||slider to 10

340 945484574750|||1810|||slider to 9 945484575850|||1100|||slider to 6 945484586940|||11090|||slider to 7 945484588150|||1210|||slider to 21 945484610400|||22250|||slider to 22

345 945484611500|||1100|||slider to 29 945484626050|||14550|||clicked on |Legend of the Werewolf|, index 383, mark 0.33, reliability 0.261391, peers 374|224|204|366|143|317| 945484626050|||0|||starting animation

350 945484627310|||1260|||animation done - held 0 945484637040|||9730|||clicked on |Executioner’s Song, The|, index 455, mark 0.69, reliability 0.432959, peers 447|331|359|178|382|352| 945484637040|||0|||starting animation

355 945484638410|||1370|||animation done - held 0 945484651700|||13290|||clicked on |Billy Two Hats|, index 360, mark 0.33, reliability 0.211914, peers 319|451|188|193|220|237| 945484651700|||0|||starting animation 945484652910|||1210|||animation done - held 0

360 945484656420|||3510|||slider to 27 945484657520|||1100|||slider to 8 945484659230|||1710|||clicked on |Little Foxes, The|, index 94, mark 0.78, reliability 0.488734, peers 43|81|125|109|45|60| 945484659230|||0|||starting animation

365 945484660430|||1200|||animation done - held 0 945484668180|||7750|||clicked on |Eagle Squadron|, index 101, mark 0.3, reliability 0.164016, peers 173|451|456|462|469|468| 945484668180|||0|||starting animation 218 Movie Experiment Logs

945484669550|||1370|||animation done - held 0

370 945484675760|||6210|||clicked on |To Dorothy a Son|, index 190, mark 0.5, reliability 0.0, peers 348|163|162|288|184|264| 945484675810|||50|||starting animation 945484677020|||1210|||animation done - held 0 945484678610|||1590|||clicked on |Fast Company|, index 58,

375 mark 0.5, reliability 0.0, peers 100|121|133|149|153|157| 945484678610|||0|||starting animation 945484679880|||1270|||animation done - held 0 945484680210|||330|||clicked on |Arkansas Judge|, index 86, mark 0.5, reliability 0.0, peers 67|388|444|136|326|98|

380 945484680210|||0|||starting animation 945484681640|||1430|||animation done - held 0 945484687130|||5490|||slider to 11 945484688230|||1100|||slider to 84 945484691410|||3180|||slider to 83

385 945484692460|||1050|||slider to 53 945484694930|||2470|||clicked on |"Here’s Lucy"|, index 319, mark 0.5, reliability 0.0, peers 62|159|360|353|458|139| 945484694930|||0|||starting animation 945484696140|||1210|||animation done - held 0

390 945484696790|||650|||clicked on |Billy Two Hats|, index 360, mark 0.33, reliability 0.211914, peers 319|451|188|193|220|237| 945484696790|||0|||starting animation 945484698000|||1210|||animation done - held 0 945484710090|||12090|||slider to 49

395 945484711130|||1040|||slider to 36 945484717120|||5990|||slider to 35 945484718220|||1100|||slider to 27 945484724530|||6310|||clicked on |Eagle Squadron|, index 101, mark 0.3, reliability 0.164016, peers 173|451|456|462|469|468|

400 945484724530|||0|||starting animation 945484725740|||1210|||animation done - held 0 945484748310|||22570|||clicked on |Stitches|, index 469, mark 0.28, reliability 0.300064, peers 108|101|117|183|280|142| Correct, but after a lot of hunting.

405 945484748310|||0|||starting animation 945484749740|||1430|||animation done - held 0 945484750730|||990|||Question 35 feedback 945484751120|||390|||presented Question 36 id ’lowRec’ 945484755290|||4170|||Question 36 feedback

410 no select. based on previous answer it was close. 945484755620|||330|||presented Question 37 id ’highRel’ G.3 A Poor Session 219

945484761170|||5550|||clicked on |Lawrence of Arabia|, index 265, mark 0.86, reliability 0.885996, peers 200|339|322|177|295|440| 945484761170|||0|||starting animation

415 945484762430|||1260|||animation done - held 0 945484764410|||1980|||Question 37 feedback closeish, but incorrect. 945484764790|||380|||presented Question 38 id ’manRel’ 945484772370|||7580|||searched for man

420 945484772430|||60|||starting animation 945484773640|||1210|||animation done - held 0 945484785610|||11970|||clicked on |Superman|, index 411, mark 0.68, reliability 0.840705, peers 485|436|268|488|403|208| 945484785610|||0|||starting animation

425 945484787040|||1430|||animation done - held 0 945484787200|||160|||Question 38 feedback Started well, but didn’t get the most certain. 945484787700|||500|||presented Question 39 id ’otherComments’ 945484882220|||94520|||Question 39 response The choice of colours

430 will alientate about 30% of the population who are red/green colorblind. choosing yellow for unknown is too close to green (on this screen) blue might be better 945484882220|||0|||Question 39 feedback 945484882550|||330|||--rlog save--

Appendix H

Worked Examples of Tasks

An example of an EASY task can be fount in Figure 6.1 on page 102. Similarly, an example of a COUSIN task may be fount in Figure 6.2 on page 103.

Figure H.1: A HARD task.

In the example of a HARD task in Figure H.1, the participant must select a title with ‘day’ in it. Using the search facility accessed from the Action menu they search for day. VlUM 2.0 displays titles that contain that substring. Finally, the user selects one of the titles.

221 222 Worked Examples of Tasks

Figure H.2: A REC task.

Figure H.2 shows a REC task. The participant must select a title with a recommendation < 40%. First they set the slider to the 40% mark. They then find a title that is still red (the white title in the centre pane is the one being looked at. It is white because it is under the mouse pointer). Finally that title is selected.

In Figure H.3 on the facing page the participant must select a title with a certainty > 80%. First they use the mouse to look at the certainty of titles that are close to the left side of the display. The white title is the one currently being inspected. In the image on the above left Lawrence of Arabia is being inspected, and the status bar at the bottom of the display shows the score and certainty for the title. When the user finds a suitable title, they select it. 223

Figure H.3: A CERT task. 224 Worked Examples of Tasks

Figure H.4: A CERTEASY task.

The CERTEASY task in Figure H.4 requires the participant to select a title with the most certainty, that is a peer of the currently selected title. First they use the mouse to look at the certainty of titles that are peers of the current selection, and are therefore well spaced and larger. The white title is the one currently being inspected. They test all titles to find the most certain one, then they select it. In Figure H.5 on the facing page the participant must select a title with the most cer- tainty, that contains the letters ‘it’. First they search for the titles containing those letters, and then look and use the mouse to check the certainty of titles that are shown by the search. The most certain title is found and selected.

The HARDREC task in Figure H.6 on the next page requires the participant to select a title with the least recommendation that contains the letters ‘at’. First they search for the titles containing those letters, giving the middle screen above. They then use the mouse to check the recommendation of titles that are shown by the search. The least recommended title is found and selected in the rightmost screen. In Figure H.7 on page 226 the participant must select the most recommended title. First the slider is moved to maximum, and then reduced until some green titles can be seen 225

Figure H.5: A CERTHARD task.

Figure H.6: A HARDREC task.

(middle screen in the figure). These green titles are then examined to see which is the maximum, as can be seen in the middle screen above, where the title in white (Directed by William Wyler) is displayed in the status bar with its score and certainty. This stage is difficult because of the cluttering. Finally, the most recommended title is selected.

The task in Figure H.8 on the next page, EASYREC, is to select the most recommended title from the peers of the currently selected title. First the slider is moved to maximum, and then moved down until one or more of the peers turns green, as can be seen in the top middle screen above. In this example three of them turn green at the same time because the colour scale is only accurate to 10%. The mouse is then used to search out the most 226 Worked Examples of Tasks

Figure H.7: A RECSLIDER task.

Figure H.8: A EASYREC task.

recommended from those that have just turned green, as seen in the third screen. Again, inspecting the status bar while moving the mouse pointer over titles is an effective method for this. Finally, the most recommended title that is a peer of the current selection is selected in the last screen.

The participant must select a title of certainty of > 70% and recommendation > 70%. In the top middle screen the slider has been set to 70% so that all titles that might fit the condition are green. The mouse is then used to check titles that are green, and are somewhat towards the left of the display (and hence quite certain), as shown in the third screen. When a suitable title is found, it is selected, showing the last. 227

Figure H.9: A CERTREC task.

Figure H.10: An example of display stretching.

If the feature is enabled, the display may be stretched as shown in Figure H.10 by clicking on and dragging a topic. This topic is then considered fixed at the place it was dragged to by the layout algorithm. By dragging more than one topic a cluttered area may be stretched out and the individual titles seen. In this example we stretch the area between Sabrina and War and Peace and select one of the titles between them (Moby Dick). The top middle screen shows the dragging of Sabrina, the top right screen shows the dragging of War and Peace with Sabrina already fixed, and the bottom screen shows the state after the final selection.

Sometimes stretching is time consuming, and really all a participant wants to do is quickly expand a region. They can do this by clicking on the second mouse button. VlUM then stretches the region used by the twenty topics around the position of the mouse click, as shown in Figure H.11 on the next page. In this example, we wish to unclutter the area around Roman Holiday. Clicking the second mouse button on Roman Holiday gives us the 228 Worked Examples of Tasks

Figure H.11: Quick stretching example.

second screen in the figure. From there we can select or inspect the newly exposed titles. Appendix I

Dot Graphs

The layout algorithm used in VlUM 2.0 is only one way of drawing a graph. It is, however, particularly compact. As a contrast I used DOT, a popular graph drawing package, to pro- duce Figures I.1 and I.2, which show the graphs used in the IMDB data set described in Section 3.2. Only the 100 and 300 item sets were drawn, as DOT was unable to draw larger sets on the machines available to me.

229 230 Dot Graphs

Crook Buster

Ridin’ for Love

Roman Holiday

Captain Newman, M.D.

Devil’s Rain, The

Dr. Phibes Rises Again

Lady and the Mob, The

Children’s Hour, The

These Three

Martin of the Mounted

Lawrence of Arabia

Roots of Heaven, The

Sun Also Rises, The

Hemingway’s Adventures of a Young Man

Man in the Gray Flannel Suit, The

Superman II

Peter and Paul

House Divided, A

Moby Dick

Bibbia, La

Lazy Lightning

Don’t Shoot

Pinnacle Rider, The

Daze of the West

Fire Barrier, The Bill of Divorcement, A

Gunless Bad Man, The Adventures of Gerard, The

Out of the Fog

Keys of the Kingdom, The

Night People

Lady Bodyguard

Kid From Kokomo, The

Guy Named Joe, A

Duel in the Sun

Frederic Remington: The Truth of Other Days

Spellbound

Paradine Case, The

Great Catherine

Proposal, The

Poppies Are Also Flowers

Lisa

Boum sur Paris

How to Steal a Million

How the West Was Won

Clark Gable: Tall, Dark and Handsome

Hatari!

Tam Lin

Birch Interval

Desperate Hours, The

Friendly Persuasion

Winter Kills

Macomber Affair, The Yearling, The

Star Wars: Episode I - The Phantom Menace

Magnificent Seven, The

Web, The

"Switch"

Dead End

Hunter, The

Papillon

Fixer, The

Gentleman’s Agreement

Directed by William Wyler

Crescete e moltiplicatevi

Westerner, The

Five Came Back

Curtain Call

Evening in Byzantium

Switch

Girl Rush, The Moving Violation

McQ Heavenly Body, The

Carrie Superman

Escape to Witch Mountain Moby Dick

Whiffs Murder Will Out

Racconti di Canterbury, I Land of the Pharaohs

Witness for the Prosecution Only the Valiant Bravados, The

Baby Doll David and Bathsheba

Scarlet and the Black, The Gunfighter, The

We Who Are Young Twelve O’Clock High

Fugitives for a Night Days of Glory

Man to Remember, A Snows of Kilimanjaro, The Valley of Decision, The

Sorority House Beloved Infidel Big Country, The

Johnny Got His Gun

Figure I.1: Graph of the 100 movie data set drawn with DOT. 231

Crook Buster

Ridin’ for Love

Roman Holiday

Captain Newman, M.D.

Devil’s Rain, The

McQ

Carrie

These Three

Martin of the Mounted

Kelcy Gets His Man

Silent Partner, The

Two Fister, The

Blazing Days

Craze

Exodus

Who Slew Auntie Roo?

To Dorothy a Son

Heaven with a Barbed Wire Fence

Brother Rat and a Baby

Smash-Up, the Story of a Woman

Pacific Rendezvous

Goodbye Charlie

Love Lottery, The

Secret People, The

Road House

Down River

Murder Will Out

Camels Are Coming, The

Storm, The

Million Pound Note, The

Desert Dust

Shooting Straight

Ore Raiders, The

Madison Avenue

Thieves Fall Out

Mr. District Attorney

Border Cavalier, The

Thunder Riders

How to Steal a Million

Collector, The

Killer Who Wouldn’t Die, The

Godfather: Part III, The

Superman II

Sicilian, The

"Quatermass II"

Sleeping Tiger, The

"Clochmerle"

Story on Page One, The

Rocket to the Moon

Welcome to Blood City

Mirage

Lone Star, The

Hard Fists

Stolen Ranch, The

Lazy Lightning Galloping Justice

Don’t Shoot Ruling Class, The

Pinnacle Rider, The Wuthering Heights

Daze of the West Dr. Phibes Rises Again

Fire Barrier, The Abominable Dr. Phibes, The

Gunless Bad Man, The Lucky Jim

Skokie

Baby Doll

Moving Violation

Lady Bodyguard

Sorority House

Bill of Divorcement, A

Riptide

Young Ideas

Who’s Got the Action?

They Got Me Covered

Mayerling

Monte Carlo Baby

Love in the Afternoon

Sabrina

Dead End

Desperate Hours, The

Young Doctors, The

Foolin’ Around

Cugini carnali

Crescete e moltiplicatevi

Proposal, The

Uncle Rollo

Pictura

Hollywood: The Selznick Years

War and Peace

You Belong to Me

See Here, Private Hargrove

Fugitive Family

Hemingway’s Adventures of a Young Man

"Blue and the Gray, The"

"Young Lawyers, The"

Anybody Here Seen Kelly?

Meet Dr. Christian

Tenderfoot Courage

Danger

What?

Salamander, The

Seven Thieves

Mark, The

Unconquered

Court-Martial of Billy Mitchell, The

Rosebud

Dreamscape

Modesty Blaise

Sea Wolves: The Last Charge of the Calcutta Light Horse, The

Cry of the Banshee

Fast Company

Victors, The

Mutiny on the Bounty

Superman

Godfather Trilogy: 1901-1980, The

Columbo: Bye-Bye Sky-High I.Q. Murder Case, The

Ziegfeld: The Man and His Women

Why Shoot the Teacher?

Tam Lin

Detective Story

Captain Sindbad

Captain Horatio Hornblower

Grande attacco, Il

Act One

Spartacus

Witness for the Prosecution

Inspector General, The

Hu-Man

Second Chorus Funny Face

Your Witness

He Ran All the Way

Woman of Distinction, A

Circle of Iron

New Faces of 1937

Lady and the Mob, The

Children’s Hour, The

Directed by William Wyler Heavenly Body, The

Hatari!

Land of the Pharaohs

Great Catherine

Mastermind

Longest Day, The

Roots of Heaven, The

Bibbia, La

House Divided, A

Dodsworth

Cinderella Liberty

Misfits, The

Firepower

Sentinel, The

Goliath Awaits

How to Beat the High Co$t of Living

Peter and Paul

"Word, The"

Evening in Byzantium

Switch

Girl Rush, The "Switch"

Escape to Witch Mountain Out of the Fog Birch Interval

Whiffs On Your Toes

Your Ticket Is No Longer Valid

My Dream Is Yours

My Fair Lady

Guns of Navarone, The

"Rogues, The"

Casino Royale

Dispatch from Reuters, A

High Wind in Jamaica, A

Lawrence of Arabia

World in His Arms, The

"Wagon Train"

Only the Valiant

Spellbound

Paradine Case, The

So Evil My Love

I Love You Again

Star Wars: Episode I - The Phantom Menace

Clark Gable: Tall, Dark and Handsome

Our Vines Have Tender Grapes

Thirty Seconds Over Tokyo

Remarkable Andrew, The

Damien: Omen II

Foxtrot

Adventures of Gerard, The

Moon-Spinners, The

People Next Door, The

Hunter, The

Magnificent Seven, The

Papillon Movie Movie Surprise Package

Man to Remember, A Once More, with Feeling

Fugitives for a Night Romance of a Horsethief

Guy Named Joe, A

Flying Irishman, The

Hawaii

Christopher Columbus

Scarlet and the Black, The

Macomber Affair, The

Yellow Sky

Nuts Shootout

How the West Was Won

"Virginian, The"

"Dream Factory, The" Anna and the King of Siam

Man in the Gray Flannel Suit, The

Gone to Earth

Duel in the Sun

Westerner, The

Boum sur Paris

Nous irons a Monte Carlo

Mrs. Miniver

Slightly Dangerous

Another Time, Another Place

"Naked City"

Fuller Brush Girl, The

Moby Dick

Attack

Cerveau, Le

Aunt Sally

Courageous Dr. Christian, The

Dr. Jekyll and Mr. Hyde

Horsemen, The Billy Two Hats

Fixer, The "Here’s Lucy"

Kid From Kokomo, The Five Came Back

Johnny Got His Gun We Who Are Young

Fisherman’s Wharf

Footlight Fever Arkansas Judge

Domino Principle, The Teorema Last Sunset, The

Becket Racconti di Canterbury, I

Lisa Kitty Foyle

Tender Comrade Poppies Are Also Flowers

Curtain Call Bravados, The Executive Action

David and Bathsheba Little Foxes, The

What Next, Corporal Hargrove? Moby Dick Letter, The

Thin Man Goes Home, The Good Fairy, The Winter Kills Jezebel

Shadow of the Thin Man Paradise Lost Friendly Persuasion Breakfast at Tiffany’s Purple Plain, The Gun Crazy

Teahouse of the August Moon, , The Best Years of Our Lives, The

Woman of Straw One Touch of Venus Man of La Mancha Valley of Decision, The

Last Days of Man on Earth, The Days of Glory

Keys of the Kingdom, The

Night People

Yearling, The

Angel from Texas, An Gentleman’s Agreement Frederic Remington: The Truth of Other Days My Love Came Back

Web, The Gunfighter, The Love Among Thieves Kisses for My President

Something in the Wind Twelve O’Clock High

Trouble in High Timber Country Beloved Infidel

Bombardier Snows of Kilimanjaro, The

Border, The Sun Also Rises, The

Great Mr. Nobody, The

Wagons Roll at Night, The

Figure I.2: Graph of the 300 movie data set drawn with DOT. The movie titles become readable (although still small) if the page is expanded to 1600% of it’s current size.

Bibliography

[AA01] Lada A. Adamic and Eytan Adar. Friends and neighbors on the web, 2001. http://www.parc.xerox.com/istl/groups/iea/papers/ web10/.

[AMY88] R. Akscyn, D. I. McCracken, and E. A. Yoder. KMS: A distributed hypertext for managing knowledge in organizations. Communications of the ACM, 31(7):820Ð835, July 1988.

[And93] J. R. Anderson. Rules of the mind. Lawrence Erlbaum, New York, N.Y., 1993.

[AS94] Christopher Ahlberg and Ben Shneiderman. Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In Proceed- ings of CHI’94, pages 313Ð317, 1994.

[AWS92] Christoper Ahlberg, Christopher Williamson, and Ben Shneiderman. Dy- namic queries for information exploration: An implementation and evalu- ation. In Proceedings of the ACM Conference on Human Factors in Com- puter Systems (CHI), pages 619Ð626, 3Ð7- 1992.

[Ben00] Ben B. Benderson. Fisheye menus. In Proceedings of ACM Conference on User Interface Software and Technology (UIST 2000), pages 217 Ð 226. ACM Press, 2000.

[Ber81] J. Bertin. Graphics and Graphic Information Processing, pages 62Ð81. 1981.

[BG99] Dan Brickley and R.V. Guha. Resource description framework (RDF) schema specification, 1999. http://www.w3.org/TR/PR-rdf-schema/.

[BLCL+94] Tim Berners-Lee, Robert Cailliau, Ari Luotonen, Henrik Frystyk Nielsen, and Arthur Secret. The World Wide Web. Communications of the ACM, 37(8):76Ð82, 1994.

[BLHL01] Tim Berners-Lee, Jim Hendler, and Ora Lassila. The semantic web. Scien- tific American, May 2001.

233 234 BIBLIOGRAPHY

[Bos87] T. Bosser. Learning in man-computer interaction. Springer-Verlag, 1987.

[Bul97] Susan Bull. See Yourself Write: A simple student model to make students think. In Anthony Jameson, Cecile« Paris, and Carlo Tasso, editors, User Modeling: Proceedings of the Sixth International Conference, UM97,New York, 1997. Springer.

[CA95] A T Corbett and J Anderson. Knowledge tracing: modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4:253 Ð 278, 1995.

[C.C88] C.C.I.T.T. Recommendation X.509. The Directory-Authenticatoin Frame- work, 1988.

[CK94] R. Cook and J. Kay. The justified user model: a viewable, explained user model. In A Kobsa and D Litman, editors, Proceedings of the Fourth In- ternational Conference on User Modeling UM94, pages 145Ð150, Boston, USA, 1994. Hyannis, Massechusetts, USA.

[CKRT95] Ronny Cook, Judy Kay, Greg Ryan, and Richard C. Thomas. A toolkit for appraising the long term usability of a text editor. Software Quality Journal, 4(2):131Ð154, 1995.

[CMN83] S. K. Card, T. P. Moran, and A. Newell. The Psychology of Human- Computer Interaction. Erlbaum, Hillsdale, New Jersey, 1983.

[CMS99] S. K. Card, J. D. Mackinlay, and B. Shneiderman, editors. Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, 1999.

[Con87] J. Conklin. Hypertext: A survey and introduction. IEEE Computer, 20(9):17Ð41, 1987.

[CU95] Bay-Wei Chang and David Ungar. Animation: From cartoons to the user interface. Technical report, Sun Microsystems Laboratories, Inc, Mountain View, California, 1995.

[Don95] Judith S. Donath. Visual who: Animating the affinities and activities of an electronic community. In Proceedings of ACM Multimedia ’95, 1995.

[dub99] Dublin core metadata initiative, 1999. http://purl.org/DC.

[ELE92] S. G. Eick, Steffen J. L., and Sumner E. E. Seesoft Ð a tool for visualizing line oriented statistics software. IEEE Transactions on Software Engineer- ing, 18(11):957Ð968, 1992.

[FK00] Josef Fink and Alfred Kobsa. A review and analysis of commercial user modeling servers for personalization on the world wide web. User Modeling and User-Adapted Interaction. Special issue on deployed user modeling, 10(3-4):209Ð249, 2000. BIBLIOGRAPHY 235

[Fur81] G. W. Furnas. The fisheye view: A new look at structured files. Technical Report 81-11221-9, Bell Laboratories, 1981.

[gmp] The university of sydney medical program. http://www.smp.usyd.edu.au/.

[Goo87] D. Goodman. The Complete HyperCard Handbook. Bantam Books, 1987.

[GZROSC99] J. Greer, J. D. Zapata-Rivera, C. Ong-Scutchings, and J. E. Cooke. Visual- ization of bayesian learner models. In Proceedings of the workshop ‘Open, Interactive, and other Overt Approaches to Learner Modelling’ at AIED‘99, Le Mans, France, July 1999.

[HKW+96] Kristina Ho¬ok,¬ Jussi Karlgren, Annika Wærn, Nils Dahlback,¬ Carl Gustaf Jansson, Klas Karlgren, and Benoˆit Lemaire. A glass box approach to adaptive hypermedia. User Modeling and User-Adapted Interaction, 6(2- 3):157Ð184, 1996.

[HMM99] I. Herman, G. Melanc¸on, and M. S. Marshall. Graph visualisation and navigation in information visualisation, 1999. http://www.cwi.nl/InfoVisu/Survey/ StarGraphVisuInInfoVis.html.

[HMT87] Frank G. Halasz, T. P Moran, and R. H. Trigg. NoteCards in a nutshell. In Proceedings of the 1987 ACM Conference of Human Factors in Computer Systems, pages 45Ð52, April 1987.

[HS94] Frank Halasz and Mayer Schwartz. The Dexter hypertext reference model. Communications of the ACM, 37(2):30Ð39, February 1994.

[HS97] T. Howes and M. Smith. LDAP. programming directory-enabled applica- tions with the Lightweight Directory Access Protocol. MacMillan, 1997.

[HTM99] HTML home page, 1999. http://www.w3.org/MarkUp/.

[icd92] The international classification of diseases, 10th edition, 1992. http://www.who.int/whosis/icd10/.

[JBW99] Jerome Wiesner: A random walk through the twentieth century, 1999. http://ic.www.media.mit.edu/JBW/.

[JPT97] A. Jameson, C. Paris, and C. Tasso, editors. User Modeling, Proceedings of the Sixth International Conference UM97, New York, 1997. Springer- Verlag.

[Kay99] J Kay. A scrutable user modelling shell for user-adapted interaction. PhD thesis, University of Sydney, 1999. 236 BIBLIOGRAPHY

[KC93] J. Kay and K. Crawford. Metacognitive processes and learning with intel- ligent educational systems. In P. Slezak, editor, Cognitive Science Down Under, pages 63Ð77. Ablex, 1993.

[KG93] J. Kay and N.P. Gheibi. Supporting a coaching system with viewable learner models. In International Conference for Computers and Computer Tech- nologies in Education, 1993.

[KKM93] Alfred Kobsa, T. Kuhme, and U. Malinowski. User modeling: recent work, prospects and hazards, pages 111 Ð 128. North Holland, 1993.

[KKP01] Alfred Kobsa, Jurgen¬ Koenemann, and Wolfgang Pohl. Personalized hyper- media presentation techniques for improving online customer relationships. The Knowledge Engineering Review, 2001.

[Kob90] Alfred Kobsa. User modeling in dialog systems: potentials and hazards. Computational Intelligence, 6(4):193 Ð 208, 1990.

[KP95] Alfred Kobsa and Wolfgang Pohl. The BGP-MS user modeling system. User Modeling and User-Adapted Interaction, 4(2):59Ð106, 1995.

[KS78] H. Kadmon and E. Shlomi. A polyfocal projection for statistical surfaces. Cartograph, 15(1):36Ð41, 1978.

[LA94] Y. K. Leung and M. D. Apperley. A review and taxonomy of distortion- orientation presentation techniques. ACM Transactions on Computer- Human Interaction, 1:126Ð160, 1994.

[Lag96] Carl Lagoze. The Warwick framework. D-Lib Magazine, July/August 1996.

[Lam94] Leslie Lamport. LATEX, A Document Preparation System. Addison-Wesley, 1994.

[Law84] M. J. Lawson. Being executive about meta-cognition. Academic Press, Orlando, 1984.

[LOS92] H. Liao, M. Osada, and Ben Shneiderman. A formative evaluation of three interfaces for browsing directories using dynamic queries. Technical Re- port CS-TR-2841, CAR-TR-605, Dept. of Computer Science University of Maryland, 1992.

[LR96] J. Lamping and R. Rao. The hyperbolic browser: A focus + context tech- nique for visualizing large heirarchies. Journal of Visual Languages and Computing, 7(1):33Ð55, 1996.

[LS99] Ora Lassila and Ralph Swick. Resource description framework (RDF) model and syntax specification, 1999. http://www.w3.org/TR/REC-rdf-syntax/. BIBLIOGRAPHY 237

[Mat01] Mathematical markup language MathML, version 2.0, 2001. http://www.w3.org/TR/MathML2/.

[mes01] MeSH: Medical subject headings, 2001. http://www.nlm.nih.gov/mesh/.

[Mun00] Tamara Munszner. Interactive Visualization of Large Graphs and Networks. PhD thesis, Stanford University, June 2000.

[Mur96] Michael Murtaugh. The automatist storytelling system. Master’s thesis, MIT, MIT Media Lab, 1996.

[Nor83] Donald A. Norman. Some observations on mental models. Lawrence Erl- baum, 1983.

[NSP96] C. North, B. Shneiderman, and C Plaisant. User controlled overviews of an image library: A case study of the visible human. In Proceedings of the ACM Digital Libraries 96, pages 74Ð82, 1996.

[NSS88] B. Clifford Neuman, J. G. Steiner, and J. I. Schiller. Kerberos: An au- thentication service for open network systems. In Winter 1988 (USENIX) Conference, pages 191Ð201, Dallas, TX, USA, 1988.

[Orw96] J. Orwant. For want of a bit the user was lost: Cheap user modeling. IBM Systems Journal, 35:398Ð416, 1996.

[PG00] Noroja Parandeh-Gheibi. Evaluation of a viewable user model in an au- thentic field study of teaching a text editor. Master’s thesis, Department of Computer Science, The University of Sydney, 2000.

[PIC99] Platform for internet content selection, 1999. http://www.w3c.org/PICS/.

[PS95] Ana Paiva and John Self. Tagus Ð a user and learner modeling workbench. User Modeling and User-Adapted Interaction, 4(3):197Ð226, 1995.

[PSH95] A. Paiva, J. Self, and R. Hartley. Externalising learner models. In J. Greer, editor, Proceedings of the World Conference on Artificial Intelligence in Education, Washington DC, U.S.A., 1995. AACE.

[PtCotEU95] The European Parliament and the Council of the European Union. European community directive on data protection, 1995. http://www.doc.gov/ecommerce/eudir.htm.

[RC94] R. Rao and Stewart K. Card. The table lens: Merging graphical and sym- bolic representations in an interactive focus + context visualization for tab- ular information. In Proceedings CHI’94, pages 318Ð332, 1994.

[RCM93] G. G. Roberson, S. K. Card, and J. D. Mackinlay. Information visualization using 3D interactive animation. Communications of the ACM, 36(4):57Ð71, 1993. 238 BIBLIOGRAPHY

[res97] Recommender systems. Communications of the ACM, 40(3):56 Ð 58, 1997.

[RMC91] George G. Robertson, Jock D. Mackinlay, and Stuart K. Card. Cone trees: Animated 3D visualizations of hierarchical information. In Scott P. Robert- son, Gary M. Olson, and Judith S. Olson, editors, Proc. ACM Conf. Hu- man Factors in Computing Systems, CHI, pages 189Ð194. ACM Press, 28 Ð 2 1991.

[Sel94] J. Self. Formal approaches to student modelling, pages 295 Ð 352. Springer- Verlag, 1994.

[SH91] A. Siochi and D. Hix. A study of computer-supported user interface eval- uation using maximal repeating pattern analysis. In S. P. Robertson, G. M Olson, and J. S. Olson, editors, Proceedings of the ACM SIGCHI confer- ece on Human Factors in computer systems, pages 301Ð305, Reading, MA, 1991. Addison-Wesley Publishing.

[SHJ94] H. Skotnes, G. Hartvigsen, and D. Johansen. 3D visualization of weather forecasts and topography. Technical Report 94-15, University of Troms¿, September 1994.

[Sma96] D. Small. Navigating large bodies of text. IBM Systems Jounal, 35:515Ð 525, 1996.

[SMI98] Synchronized multimedia integration language (SMIL) 1.0 specification, 1998. http://www.w3.org/TR/REC-smil/.

[SVG00] Scalable vector graphics SVG 1.0 specification, November 2000. http://www.w3.org/TR/SVG/.

[TR97] Lloyd A. Treinish and L. Rothfusz. Three-dimensional visualization for support of operational forecasting at the 1996 centennial olympic games. In Proceedings of the Thirteenth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography and Hydrology, pages 31Ð34. American Meterology Society, February 1997.

[Tuf83] Edward R. Tufte. The Visual Display of Quantitive Information. Graphics Press, 1983.

[Tuf90] Edward R. Tufte. Envisioning Information. Graphics Press, 1990.

[Tuf97] Edward R. Tufte. Visual Explainations. Graphics Press, 1997.

[WK86] W. Wahlster and A. Kobsa. Dialogue-based user models. Proceedings of the IEEE, 74(7):948Ð960, 1986.

[XHT00] XHTML 1.0: The extensible hypertext markup language, 2000. http://www.w3.org/TR/xhtml1/. BIBLIOGRAPHY 239

[XML99a] Extensible markup language (XML) 1.0, 1999. http://www.w3.org/TR/1998/REC-xml-19980210.

[XML99b] Namespaces in XML, 1999. http://www.w3.org/TR/1999/REC-xml-names-19990114.

[XML01] XML schema, 2001. http://www.w3.org/TR/xmlschema-1/.

[XSL00] Extensible stylesheet language XSL version 1.0, 2000. http://www.w3.org/TR/xsl/.

[Yus85] S. Yussen. The role of metacognition in contemporary theories of cognitive development, volume 1. Academic Press, Orlando, 1985.

[ZRG00] J. D. Zapata-Rivera and J. Greer. Inspecting and visualizing distributed bayesian student models. In Intelligent Tutoring Systems ITS 2000, Mon- treal, June 2000.

[ZRNG99] Juan-Diego Zapata-Rivera, Eric Neufeld, and Jim E. Greer. Visualization of baysean belief networks. In Proceedings of IEEE Visualization 1999 Late Breaking Hot Topics, pages 85 Ð 88, October 1999.