Quick viewing(Text Mode)

The User's Mental Model of an Information Retrieval System: an Experiment on a Prototype Online Catalog

The User's Mental Model of an Information Retrieval System: an Experiment on a Prototype Online Catalog

UCLA Publications

Title The user's of an retrieval : An experiment on a prototype online catalog

Permalink https://escholarship.org/uc/item/2k3386nz

Journal International Journal of Human-Computer Studies, 51(2)

ISSN 1071-5819

Author Borgman, Christine L.

Publication Date 1999

DOI 10.1006/ijhc.1985.0318

Peer reviewed

eScholarship.org Powered by the California Digital Library University of California THE USER'S MENTAL MODEL OF AN SYSTEM

Christine L. Borgman Graduate School of Library and University of California, Los Angeles

ABSTRACT INTRODUCTION An empirical study was performed In the search to understand how to train naive subjects in the use of a naive user learns to comprehend, a prototype Boolean -based in- about, and utilize an interac- formation retrieval system on a bib- tive computer system, a number of liographic . Subjects were researchers have begun to explore the undergraduates with little or no of the user's mental model of prior . Subjects trained with a of a system. Among the claims are that the system performed better than a mental model is useful for deter- mining methods of interaction [1,2], subjects trained with procedural in- problem solving [2,3], and debugging structions, but only on complex, problem-solving tasks. Performance [4]; that model-based training was equal on simple tasks. Differ- is superior to procedural training ences in of interaction with [2,5,6]; that users build models the system (based on a stochastic spontaneously, in spite of training process model) showed parallel re- [1,7]; that incorrect models lead to sults. Most subjects were able to problems in interaction [4,7]; and articulate some description of the that interface design should be based system's operation, but few articu- on a mental model [8,9]. Not sur- lated a model similar to the card prisingly, these authors use a varie- catalog provided intraining. ty of definitions for "mental model" Eleven of 43 subjects were unable to and the term "conceptual model" is achieve minimal competency in system often used with the same . use. The failure rate was equal Young [i0] was able to identify eight between training conditions and gen- different uses of the term "concep- ders; the only differences found tual model" in the recent literature, between those passing and failing the for example. This author prefers the benchmark test were academic major distinction made by Norman [7] that a and in frequency of library use. conceptual model is a model presented to the user, usually by a designer, researcher, or trainer, which is intended to convey the workings of the system in a manner that the user can understand. A mental model is a model of the system that the user builds in his or her . The user's mental model may be based on the conceptual model provided, but is probably not identical to it. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for The first comparing direct commercial advantage, the ACM copyright notice and the conceptual models to procedural in- title of the publication and its date appear, and notice is given structions for training sought only that copying is by pern~ission of the Association for Computing to show that the conceptual training Machinery. To copy otherwise, or to republish, requires a fee was superior [6,11]. Other recent and/or specific permission. research [1,2] has studied the inter- action between training conditions © 1985 ACM 0~89791-159-8/85/006/0268' $00.75 and tasks, finding that model-based

268 training is more beneficial for The introductory narrative pro- complex or problem solving tasks. vided to the model group described the system using an analogical model The research on mental models of the card catalog. The instruc- and training has been concentrated in tions first explained the structure the domains of text editing [ii,12] of a divided (author/title/) and calculators [1,2,4,10]; no such card catalog and then explained the research has yet been done in inform- system structure in terms of the ways ation retrieval. Information re- it was similar to a card catalog and trieval is an interesting domain, as the ways in which it was different. it is now undergoing a shift in user Boolean logic was described in terms population. In the last ten years, a of sets of catalog cards, showing significant population of highly- sample sets and the resulting sets trained searchers who act as inter- after specified Boolean combinations. mediaries for end users on commercial has developed. Although end The narrative introduction for users have been reluctant to use the the procedural group consisted of commercial systems, libraries are background information on information rapidly replacing their card catalogs retrieval that is commonly given in with online catalogs intended for system manuals. The Boolean opera- direct patron use. The online cata- tors were defined only by single- logs are typically simpler to use and sentence statements. have a more familiar record struc- ture, but still have many of the The examples provided were the difficulties associated with the use same in each conditioD, but the anno- of a complex interactive system. The tations for each reflected the dif- is a population of naive, ferences in the introductory mater- minimally-trained, and infrequent ials. The list of searchable fields users of information retrieval sys- (16 of 25 fields were searchable) was tems [13]. The need for an efficient also identical and gave examples of form of training for this population the search elements for each field. is very great and we chose it as a domain to test the advantages of The training tasks used for the model-based training. benchmark test were all classified as simple tasks, requiring the use of only one index and no more than one EXPERIMENTAL METHOD Boolean operator. The experiment con- sisted of five simple and ten complex The experiment was structured as tasks, the latter requiring two or a two-by-two design, with two train- more indexes and one or more Boolean ing conditions (model and procedural) operators. All tasks were presented and two genders. All subjects were as narrative library reference ques- undergraduates at Stanford University tions and were designed to be within with two or fewer programming courses the scope of questions that might be and minimal, if any, additional com- asked by undergraduates in performing puter experience. course assignments.

We performed the experiment on a i Subjects were given the instruc- prototype Boolean logic-based online tional materials to read and then catalog mounted on a microcomputer performed the benchmark test, which with online monitoring capabilities. consisted of completing 14 simple Two bibliographic were tasks on the small database in less mounted: a training database con- than 30 minutes. The testwas based sisting of 50 hand-selected records on pilot test findings that those who on the topic of "animals" and a !arg - took longest to complete the training er database of about 6,000 records tasks were least able to learn to use systematically sampled from the 10- the system (r=-0.83, p<.05). If the million record database of the OCLC subject passed the benchmark test, he Online Computer Library Center. or she was interviewed briefly, given the experimental tasks to perform, Subjects in each training condi- and then asked to perform one addi- tion received three training docu- tional search while talking aloud for ments: an introductory narrative, a the experimenter. Subjects were in- of annotated examples of system terviewed again after completing the operation, and a of searchable experiment. fields.

269 If the subjects were able to describe the system's operation at Due to a high failure rate on all, it was most likely in terms of the benchmark test (ll of 43, or an abstract model bearing little 26%), we were able to gather a valid resemblance to a card catalog anal- dataset of only 28 cases. The dif- ogy. Of 28 subjects, 15 (5 model ference in required to complete condition, i0 procedural) gave some the benchmark test was significant form of abstract model, four (3 (p<0.0001), with those failing aver- model, 1 procedural) articulated a aging 39.2 minutes and those passing card catalog-based model, only one averaged 18.2 minutes. Subjects subject (procedural condition) artic- failed equally in the two training ulated a model based on another - conditions and by gender. phor (robots retrieving sheets of paper from bins), and eight subjects Subjects who passed the bench- (6 model, 2 procedural) were unable mark test tended to be from science to describe the system in any model- and engineering majors rather than based manner. social science and (p<0.0001), and were less frequent Only minor differences between visitors to the library (average 8.0 genders were found. Men scored high- visits per month vs. 18.4 visits for er than women (p<0.05) on the index those who passed). Major and library of describing the system, although use were not correlated. gender explained only 14% of the variance in the model index on a In task performance, we found no linear regression. Men were found to difference between training condi- make more errors on simple tasks than tions on number of simple tasks cor- women (p<0.05), but the difference rect (p>0.05). The difference" on was not significant for errors on number of complex tasks correct was complex tasks. On simple tasks, men in the predicted direction (subjects and women reflected different pat- in the model condition scored higher terns of use at all three levels of than those in the procedural condi- zero-, first-, and second-order tran- tion) but was not significant sitions (p<0.01, 0.01, 0.001, respec- (p=0.08). tively). On complex tasks, men and women also reflected different pat- The user actions and system terns of use at all three levels responses captured in the monitoring (p<0.01, 0.05, 0.01, respectively), data were reduced to 12 discrete although less strongly. states and treated as a . The patterns of interaction were measured using the two-sample A more complete description of Kolomogorov-Smirnov (K-S) test. On the results can be found in Borgman simple tasks, we found no significant [14]. differences between training con- ditions on any of zero-, first-, or second-order two-sample K-S tests DISCUSSION (p>0.05 for each). On complex tasks, we found significant differ- Perhaps the most surprising (and e~ces between training conditions on unpredicted) finding is the degree of each level (p<0.01 for zero-order; difficulty encountered by some of the p<0.001 for first- and second-order subjects in using the system. The tests). system was similar to those in common use in libraries and the questions The analysis of model articula- were similar to those an undergrad- tion ability was based on four meas- uate might ask in seeking information ures coded from the interview : for a course assignment. Yet more completeness of the model, accuracy than one-fourth of the subjects could of the model, level of , not complete 14 simple tasks in less and use of a model in approaching the than 30 minutes. The tasks were not tasks. The first three variables difficult; nine of them were merely were highly correlated, necessitating replications of the examples (which their combination into an index. We included the search result). found no difference between condi- tions on either the model index or on The subjects who had the most the task approach variable. difficulty were those majoring ~n the social sciences and humanities. It

270 has frequently been conjectured that number of tasks, we consider the this group might have more difficulty hypothesis to be supported. We would using computing technology, but hard be reluctant to generalize the find- is difficult to establish ings beyond this sample, however. [15]. The effect is not explained by measures commonly associated with The results of this research and major, such as number of math and that of Halasz & Moran [2] show that science courses or number of comput- model-based training is superior only ing co~frses. for complex or problem-solving tasks. Our next challenge is to delineate It is doubtful that academic the distinction between simple and major alone is the factor determining complex tasks and thereby isolate the success or failure at the information factors that may cause such an inter- retrieval task. It is more likely action. These issues are left for that academic major is a surrogate research. for some other measure. Related research in human factors of comput- The predicted differences in ing has begun to identify psychologi- model articulation based on training cal and skill factors that influence condition were wholly unsupported. computing ability, such as cognitive The problem may have been methdo- [15], spatial , and age logical; the questions to solicit the [16]. The pattern differences be- model appear to have been interpreted tween men and women also suggest that in a variety of ways. A more con- some individual differences may be structive is that we may operating. The individual differen- have captured the variance in who is ces issues are of particular concern able to articulate a model, rather for online catalogs in library envir- than in who is able to build a model. onments, most of which serve a very It is possible that mental models heterogeneous population. Given the were constructed in precisely the minimal that system admin- manner predicted, yet we were unable istrators have over training this to capture this result. We can con- class of users, it is important that sider the presence of a model de- the system be easily accessible by a scription sufficient to indicate that broad population. the mental model exists, but not a necessary condition. This interpre- Another factor that distin- tation is reinforced by the lack of guished those who passed the bench- correlation between task performance mark test from those who failed was and model articulation. frequency of library usage. The result is in the opposite direction Another interesting aspect of of that which would be predicted: the model articulation results is the the frequent library users failed and lack of correlation between ability the infrequent ones passed. If fre- to describe the approach to search quency of library usage were corre- tasks and ability to describe the latedwith major, this result would system. Subjects were frequently be easier to explain. However, we able to describe their approach to can say that frequent visits to the performing searching tasks in terms library (for whatever purpose) offer of the system's operation, but were no advantage in to use an unable to describe the same opera- online catalog. tions when asked how the system worked. It is possible that the The performance differences were questions solicited two types of in the predicted direction, but less models. The model used in problem strong than we had hoped. However, solving (which results in performance the performance results were bol- effects) may be different from the stered by the stronger pattern dif- model used in describing the system. ferences in the monitoring data: no According to Halasz [3], these two significant differences on simple types of models may occur in tasks but very significant differ- sequence: one first builds a model ences on complex tasks. The pattern for problem solving and only after differences suggest at least a dif- practice is able to explain how it ference in method of interaction, if works. This interpretation is rein- not a difference in cognitive proces- forced b~ the that no subject sing. Given the nature of these was able to describe the system but results, the interaction effect, the not able to describe his or her ap- small sample size, and the small proach to the tasks.

271 One last possibility is that the New results from the later research amount of time spent in training and will be incorporated in the confer- system use was insufficient to devel- ence presentation. It is our hope op the model. Models develo p over that this research will contribute time with exposure to the system. not only to our of Given further practice, stronger re- human-computer interaction, but also sults might have been seen. to improving equity in access to .

CONCLUSIONS ACKNOWLEDGEMENTS The present study compared the use of conceptually-based training to The research reported here was that of procedurally-based training funded by the OCLC Online Computer on a prototype online catalog. Al- Library Center, Dublin, Ohio. The though the training effects were not interface simulator was developed and as strong as predicted, w e did find implemented by Howard Turtle and the hypothesized interaction effect Trong Do, under the direction of Neal between training method and task Kaske and W. David Penniman. The , indicating that - author also is grateful for the ually-based training is not always assistance of her dissertation ad- superior. The challenge of delin- visor, William Paisley, and the other eating WheD it is superior remains. members of her committee, Everett M. Rogers, David A. Thompson, and As expected, we found that it is Barbara Tversky, all of Stanford easier to measure differences in who University. is able to articulate a model than in who is able to build a model. Sub- jects in both conditions were able to REFERENCES develop models to some degree, indi- cating that people do build models [1] Bayman, Piraye; Mayer, Richard even if not trained with them. The E. 1984. Instructional manipu- fact that no relationship was found lation of users' mental models between model articulation and per- for electronic calculators. formance further suggests that the International Journal of Man- measures captured articulation abil- Machine Studies, 20, 189-199. ity only. Halasz, Frank G.; Moran, Thomas Perhaps the most important find- [2] P. 1983. Mental models and ing from this experiment is not the mental models result but the likeli- problem solving using a calcula- hood of individual differences in the tor. In Janda, Ann (ed.), Human factors in computing systems: ability to use this particular tech- Proceedings of a conference nology. Given an equal number of sponsored by the Association for math, science, and computing courses, Computing Machinery Special In- engineering and science majors still terest Group on Computer and out-performed the social science and Human Interaction and the Human humanities majors. This finding sug- Factors Society. 1983 December gests that we may be building systems 12-15, Boston, MA. , for which access is inequitable. We NY: Association for Computing are particularly concerned about this Machinery, 212-216. result in library environments, where equal access to information for all Halasz, Frank G. 1984. Mental is a primary goal of the . [3] If the implementation of a new tech- models and problem solving using nology discriminates among our users, a calculator. Ph.D. disserta- we must find a way to achieve equity tion. Stanford, CA: Stanford through training, design, or addi- University. tional assistance. [41 Young, Richard M. 1981. The machine inside the machine: The research reported here is Users' models of pocket calcula- the first in what is intended to be a tors. International Journal of continuing research program. The Man-Machine Studies, 15, 51-85. second phase, to study the individual differences correlates of technology use, is already in [17].

272 C5] Carroll, John M.; Thomas, John Computer and Human Interaction C. 1982. Metaphor and the and the Human Factors Society. cognitive representation of com- 1983 December 12-15, Boston, MA. puting systems. IEEE Transac- New York, NY: Association for tions on Systems, Man, ~ Cyber- Computing Machinery, 207-211. netics, SMC-12:2, 107-116. [13] Matthews, Joseph R.; Lawrence, [6] Foss, Donald J.; Rosson, Mary Gary L.; Ferguson, Douglas K. Beth; Smith, Penny L. 1982. 1983. Using online catalogs: A Reducing manual labor: An ex- nationwide . New York, perimental analysis of learning NY: Neal Schuman. aids for a text editor. In Association for Computing Mach- [14] Borgman, Christine L. 1984. inery, Proceedings of the human The user's mental model of an factors in computer systems con- information retrieval system: ference. 1982 March 15-17, Effects o_n performance. Unpub- Gaithersburg, MD. New York, NY: lished PhD dissertation, Stan- Association for Computing Mach- ford University. inery, 332-336. [15] Coombs, J.J.; Gibson, R.; Alty, [7] Norman, Donald A. 1983. Some J.L. 1982. Learning a first on mental models. computer : Strategies In Gentner, Dedre; Stevens, for making sense. International Albert L. (eds.), Mental models. Journal of Man-Machine Studies, Hillsdale, Nj: Lawrence Erlbaum 14:4, 449-486. Assoc. [16] Egan, Dennis E.; Gomez, Louis M. [8] Jagodzinski, A. P. 1983. A 1984. Assaying, isolating, and theoretical basis for the repre- accommodating individual differ- sentation of on-line computer ences in learning a complex systems to naive users. Inter- skill. In Dillon, Ronna F. national Journal of Man-Machine (ed.), Individual differences in Studies, 18, 215-2-~2. , Vol. 2. New York, NY: Academic Press. [9] Moran, Thomas P. 1981. The command language grammar: A rep- [17] Borgman, Christine L. 1984. resentation for the user inter- Individual differences in learn- face of Interactive systems. ing t__o use a library online International Journal of Man- catalog: Pilot Project. Re- Machine Studies, 15, 3-50. search project funded by the Spencer Foundation. [10] Young, Richard M. 1983. Surro- gates and mappings: Two kinds of conceptual models for inter- active devices. In Gentner, Dedre; Stevens, Albert L. (eds.), Mental models. Hills- dale, NJ: Lawrence Erlbaum Assoc.

[ll] Mack, Robert L.; Lewis, Clayton H.; Carroll, John M. 1983. Learning to use word processors: Problems and prospects. Assoc- iation for Computing Machinery Transactions on Office Informa- tion Systems, 1:3, 254-271.

[12] Douglas, Sarah A.; Moran, Thomas P. 1983. Learning text editor by analogy. In Janda, Ann (ed.), Human factors in computing systems: Proceedings of a conference sponsored by the Association for Computing Mach- inery Special Interest Group on

273