<<

The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-20)

An Experimental Approach to Ethics Education

Tom Williams Qin Zhu Daniel Grollman Colorado School of Mines Colorado School of Mines Plus One Golden, CO 80402 Golden, CO 80402 Boulder, CO 80301

Abstract those concepts to analyze applications of AI and Robotics. But moreover, such courses may also cover computational We propose an experimental ethics-based curricular module approaches to moral decision making; psychological theo- for an undergraduate course on . The proposed module aims to teach students how human subjects research ries of moral decision making and blame; and methods for methods can be used to investigate potential ethical concerns experimentally investigating ethical issues. This requires in- arising in human-robot interaction, by engaging those stu- structors with broad interdisciplinary backgrounds, and ped- dents in real experimental ethics research. In this paper we agogical methods that draw on these disparate disciplines. describe the proposed curricular module, describe our imple- In this work, we explore pedagogical techniques aimed at mentation of that module within a Robot Ethics course of- improving students achievement of learning objectives that fered at a medium-sized engineering university, and statis- sit at this confluence of disciplines. Specifically, we present tically evaluate the effectiveness of the proposed curricular and analyze the efficacy of a Robot Ethics course module in module in achieving desired learning objectives. While our results do not provide clear evidence of a quantifiable ben- which students participate as experimenters in experimen- efit to undergraduate achievement of the described learning tal robot ethics research, which requires them to simultane- objectives, we note that the module did provide additional ously learn methodological approaches to the study of exper- learning opportunities for graduate students in the course, as imental robot ethics, and then use those methods to engage they helped to supervise, analyze, and write up the results of with key theoretical concepts from robot ethics (in our case, this undergraduate-performed research experiment. ’ normative influence). Moreover, in this work we not only analyze the efficacy of this module, but additionally interrogate how this efficacy depends on the role that each Introduction student played in the research process. educators are increasingly acknowledg- This educational research effort thus involves two nested ing that it is insufficient for the Computer Science curricu- levels of experimentation: a randomized controlled experi- lum to entirely focus on technical issues relating to comput- ment in which the research participants were undergraduate ing and programming. Rather, for Computer Science stu- students enrolled in a Robot Ethics class and the researchers dents to be effective practitioners after graduation, their ed- were the course staff, and a second randomized controlled ucation must include key concepts from the arts, social sci- experiment in which the research participants were under- ences, and humanities, so that students not only have the graduate students sampled from across the university, and technical knowledge necessary to implement and analyze the researchers were both the Robot Ethics students and computational systems, but also have the knowledge from their instructors. In this paper, we will focus on the first of those other fields necessary to decide, on ethical grounds, these experiments (the AI Education research effort), leav- whether they should implement those systems, and if so, ing many of the details of the second experiment (the ex- how they should go about designing and evaluating the ef- perimental Robot Ethics research effort) to publication else- fectiveness of those systems. This is especially true in the where. fields of Artificial Intelligence and Human-Robot Interac- As described in this paper, our analysis yielded mixed re- tion, in which myriad ethical concerns have captured the sults with respect to the efficacy of this experimental course public’s attention, and in which human interactivity necessi- module. While we did not find any evidence that participat- tates design and evaluation techniques not otherwise taught ing in experimental ethics research was any more effective in the Computer Science curriculum. than merely listening to a traditional lecture about the re- Appropriate teaching of these skills is made especially search effort and its goals in general, we did find that for par- challenging due to the interdisciplinary nature of the sub- ticipants that participated in the research effort, the role they ject matter. AI Ethics education must clearly cover key con- played in the research effort may indeed have affected their cepts from ethics and moral as well as the use of learning of the concepts explored in that research. More- Copyright c 2020, Association for the Advancement of Artificial over, as we will describe, the module provided additional Intelligence (www.aaai.org). All rights reserved. learning opportunities to the small number of graduate stu-

13485 dents in the course, who supervised undergraduate students, The same argument can be made for other similar media- and directly contributed to the data analysis and writing of based pedagogies such as using movies to teach AI and robot the scientific paper submission resulting from the proposed ethics. However, from the perspective of moral , curricular module. there is a gap between ethical reasoning (e.g., knowing what is good vs. bad and why) and ethical action (e.g., some- Background one is committed to do good) (Rest et al. 1999). Effective professional ethics education requires future professionals to AI Ethics Education relate their moral learning experience to their own everyday Explicit discussions of AI or “expert systems” in computer personal and professional experience (or how they actually ethics education literature and textbooks can be traced back do things) (Martin 2000). For computer science students, it to the 1990s, although most of these discussions are often is crucial to reflect on how their moral learning experience speculative reflections about broader “macro” and social im- is relevant to their everyday, practical experience, empathiz- pacts of AI on humans, cultures, and societies. For instance, ing with potential users and their needs, and reflecting on Forester and Morrison (1994) hold a humanistic and specu- the (powerful) role of their expertise in shaping the soci- lative view toward the employment of AI in the society and ety. As such, it is critical to consider how real-world exam- their major concerns include: (1) whether AI is a proper goal ples and hands-on experiences with realistic AI and robotics as most AI projects are funded by the military; (2) it is a technologies may help to fill this gap. technocratic idea to employ AI in public administration, le- Carnegie Mellon’s “Artificial Intelligence Methods for gal practice, and social governance; (3) introducing AI to Social Good” course, for example, goes beyond the tradi- developing countries is another techno-fix that attempts to tional instructional approach that teaches theories of AI and remedy the symptoms without addressing the causes; and robot ethics through classroom lectures alone. Students in (4) AI degrades the human condition. the course instead acquire practical experience through re- Two curriculum design approaches have been developed search projects that employ AI methods to address press- to teach ethics in AI and robotics: (1) standalone courses ing social issues in fields such as healthcare, social welfare, (these courses can be offered in either computer science security and privacy, and environmental sustainability (Hsu or philosophy); and (2) ethics modules in AI and robotics 2018). Such hands-on experience may help students develop courses (Burton et al. 2017). Many of these curriculum de- sensitivity to the normative influence of technology on hu- velopment approaches are not so much different from tradi- mans and the society. Arguably, this model of teaching com- tional “applied ethics” approaches to teaching ethics of tech- puter science students to perform or experimentally inves- nology and engineering. For instance, in order to understand tigate ethics in a technical class has some advantages. To and discuss AI ethics issues, Burton et al. (2017) suggest that some extent, it helps make visible the values that are em- it is necessary for students to be familiar with three major bedded in the design of AI and robotic technologies. Fur- ethical theories (deontology, consequentialism or utilitarian- thermore, it creates a mindset among students that techno- ism, and virtue ethics) as tools for ethical decision-making. logical development involves value choices. Students are then invited to practice on how to use the three As science and engineering education curricula are al- ethical theories to analyze specific AI ethical situations and ready packed, integrating ethics modules into technical formulate possible courses of actions. Burton et al. (2017) courses is often more realistic for faculty. Furey and Mar- point out that teaching students the three ethical theories and tin (2018) have shared their experience with integrating a their applications in specific cases studies can be achieved in module about the ethics of algorithm development for au- either one module in a technical course or a semester-long, tonomous vehicles into a semester-long AI course. They standalone AI ethics course (with additional readings and argue that there are certain advantages with such modular case studies). approach to AI ethics education: (1) modules are easily in- An increasing number of universities such as Harvard, tegrated into technical courses; and (2) students can connect Stanford, and the University of Kentucky have started to of- the specific AI ideas they are learning in class to their ethical fer AI and robot ethics in their computer science curricu- implications. In their module, the classical Trolley Problem lum. Educators at these institutions have been experiment- and the utilitarian ethical framework were introduced to stu- ing with innovative ethics pedagogies. Burton, Goldsmith, dents and employed as tools for evaluating the benefits and and Mattei (2015; 2018) have explored the use of science costs of algorithm development. Nevertheless, one major fiction as a pedagogical tool for teaching AI ethics. They concern with integrating ethics modules in technical classes argue that using science fiction as a pedagogy has at least is that they may create an impression that ethics is added or two strengths compared to traditional pedagogies such as supplementary to technology. lectures: (1) the futuristic settings of science fictions en- able students to detach or “decontextualize” from political Research-Based Pedagogy preconceptions; and (2) science fictions have been proven Education researchers have established numerous bene- to be appealing and popular to students (Burton, Goldsmith, fits of undergraduate involvement in research outside of and Mattei 2015). Science fictions provide students with the classroom (Kardash 2000; Landrum and Nelsen 2002; a “safe” environment for “discussing and reasoning about Strayhorn 2010; Hathaway, Nagda, and Gregerman 2002; difficult and emotionally charged issues without making the Laursen et al. 2010), prompting researchers to explore the discussion personal” (Burton, Goldsmith, and Mattei 2018). integration of laboratory research into undergraduate cur-

13486 riculum itself, in the form of laboratory research mod- hypothesized normative influence of decisions made dur- ules (Lopatto 2010). These efforts have had great suc- ing robot interaction design.) cess, finding that involvement of undergraduates in the re- LO3: Ethical Research Conduct – Students should under- search process as part of classroom learning notably in- stand ethical concerns that can arise in the design and creases interest in science and graduate studies (Harrison conduction of Experimental Ethics experiments, and how et al. 2011), improves specific course topic mastery, and those concerns should be addressed. benefits students’ general critical thinking skills (Harrison et al. 2011). While much of this work has focused on in- To achieve these learning objectives, our curricular design tegration of laboratory research modules into upper-level followed a multi-stage process. classes, more recent work has sought to expose students to research even earlier by integrating laboratory research mod- Phase Zero: Research Ethics Certification ules into lower-level courses as well (Harrison et al. 2011; Before the Experimental Ethics Module is introduced in Coker 2017), as one of the largest barriers to student in- class, all students complete the CITI (Collaborative Institu- volvement in research is awareness (Wayment and Dickson tional Training Initiative) Social-Behavioral-Educational ba- 2008). sic course; a three-hour online course comprised of reading Based on the success of these educational research efforts, passages interleaved with short quizzes, designed to teach it may be valuable to build on those approaches by integrat- social, behavioral, and educational researchers the basics ing laboratory research modules into AI and Robot ethics of ethical research conduct, including risk assessment, in- classes. We believe this will be successful for a number of formed consent, research concerns with specialized popula- reasons. First, we would expect students to incur the same tions, and so forth (Braunschweiger and Hansen 2010). benefits as have been observed with previous laboratory re- This phase serves two purposes. First, it lays the ground- search modules (including improved mastery of course con- work for fulfillment of Learning Objective 3 by introducing tent, interest in graduate study, and improved critical think- and assessing key tenets of ethical research conduct. Sec- ing skills). Second, we believe this may be a unique op- ond, it provides students with the certification needed to portunity to expose students from a variety of fields to re- participate in conducting IRB-approved human-subjects re- search opportunities. Not only are AI and Robot Ethics search. courses appealing to students from a wide variety of disci- plines1, but the legwork of experimental ethics research can Phase One: Classroom Lecture easily be performed by undergraduates from disparate back- Next, students are introduced to key curricular concepts grounds, without requiring them to have deep knowledge through a 45-minute classroom lecture2. This lecture be- of the concepts being explored (assuming that experiments gins with an introduction to and motivation of experimen- themselves are designed by instructors rather than students). tal moral philosophy, experimental , and Finally, it helps provide some quantitative content to an oth- ethics-oriented empirical studies of human-robot interac- erwise qualitative course, which may help build interest in tion. The lecture then introduces an ethical concern sur- the course’s other subject matter for those engineering stu- rounding human-robot interaction design, and the design of dents who are otherwise reticent to engage with the course a human subject experiment intended to assess the valid- content. ity of that concern, including the experiment’s hypotheses, design, and procedure, and how data collected through this Experimental Ethics Curriculum Design experiment can be statistically analyzed post-experiment to In this section, we describe the design and implementation test the experiment’s hypotheses. of a research-based experimental ethics module into a pri- This phase serves two purposes, laying the groundwork marily undergraduate robot ethics class at a medium-sized for fulfillment of Learning Objectives 1 and 2, by introduc- engineering university. ing and motivating experimental ethics and the core robot Our proposed curricular module was designed to achieve ethics topic underlying the presented experiment (in our the following interdisciplinary learning objectives: case, normative influence of technology). LO1: Normative Influence of Technology – Students should Phase Two: Hands-On Training understand how technologies (like robots) can exert influ- ence on human behaviors due to their perception as moral Immediately after receiving this lecture, students travel to and social agents, and further understand how this influ- a human-robot interaction laboratory configured for human- ence can carry over into human-human relationships. subject experimentation and receive hands-on training for conducting the experiment. As described later on, our par- LO2: Experimental Ethics – Students should understand ticular instantiation of the proposed curriculum involved how human-subject experimentation can be used to ex- training students either in the experimenter role (in which plore the ethical implications of technology (in this case, 2Because the proposed module is intended to be highly flexi- 1In our own Robot Ethics course, our students this year came ble, we will for the moment leave the definition of these concepts from Applied Math and Statistics, Computer Science, Electrical vague; in the Implementation section below, we will go on to de- Engineering, Engineering Physics, Mechanical Engineering, and scribe the specific concepts investigated in our implementation of Petroleum Engineering. the proposed curricular module.

13487 students learned how to guide participants through consent procedures, brief participants on their experimental task, and debrief participants on experimental motivations upon study completion) or in the wizard role (in which students learned how to teleoperate the robot used in the experiment). This phase serves to reinforce the key concepts necessary to achieve all three learning objectives. Phase Three: Research Participation Finally, students leverage their training to assist in conduct- ing the proposed experiment, with each student helping to run three participants through the experiment over the course of several weeks. This phase serves to further reinforce the key concepts necessary to achieve all three learning objectives, thus in- corporating real-world praxis into the learning process. Assessments Once all phases of the proposed curriculum are completed, student learning is assessed through an in-class quiz in Figure 1: The SoftBank Pepper robot used in our laboratory which students must recall details of the experiment related research module. to all three learning objectives, including research hypothe- ses, metrics, experimental design, and consent procedures. own work, social robots wield unique influence over these Implementation norms due to their unique sociotechnical niche, defined by While the proposed curricular module is designed to be their joint status as perceived community members and as sufficiently flexible to teach a wide range of core Robot technological tools (Jackson and Wililams 2019). This in- Ethics concepts, our specific implementation of this curric- fluence, which social robots may wield both through direct ular module focused specifically on teaching the concepts persuasion and implicit social pressure (Briggs and Scheutz necessary to achieve Learning Objective 1, i.e., Normative 2014; Kennedy, Baxter, and Belpaeme 2014; Jackson and Influence of Technology. In this section we will briefly sum- Williams 2019; Winkle et al. 2019), may be especially marize this concept, and the experiment students were in- strong among language capable robots, due to greater lev- volved with on that topic. els of perceived social and moral agency, leading to greater With the increase of (IoT) technologies, influence on users’ systems of social and moral norms, in- voice interfaces are being added to a wide array of home cluding sociocultural norms such as norms of politeness. and work appliances, including refrigerators, microwaves, In our implementation of the proposed curricular mod- and even faucets (Faucet 2019). Despite several decades ule, our in-class lecture and class-run experiment investi- of research into mixed initiative dialogue (Allen, Guinn, gated this concern: the experiment examined the effect a and Horvitz 1999; Horvitz 1999) and turn taking (Cas- designer’s choice of wakeword might have on both robot- sell, Torres, and Prevost 1999; Traum and Rickel 2002), and human-directed politeness. In the experiment, partici- the dominant paradigm in consumer-grade voice interac- pants interacted with a SoftBank Pepper robot (Fig. 1) and tion is to use platform-specific wakewords, such as “Alexa”, a human confederate in a restaurant scenario. When inter- “Okay Google”, or “Hey Siri”, to prevent false positives acting with the robot, participants were required, depending in speech recognition and improve user privacy. However, on their experimental condition, to use either a traditional there has been significant public concern expressed in the wakeword (e.g., “Hey Pepper”) or a polite wakeword (e.g., mass media that wakeword-driven interactions may encour- “Excuse me, Pepper”). To investigate normative influence age technology-directed language that is terse and direct, of technology, the experiment collected linguistic statistics and that if children become accustomed to addressing ma- surrounding participants’ use of politeness cues in their lan- chines in this manner, this behavior could carry over into guage towards both the robot and the human confederate their interactions with other humans, leading to impolite throughout the experiment. human-directed behavior (Gordon 2018; Truong 2016). As consumer-facing interactive robots begin to be deployed into Experimental Evaluation the wild, they will also likely require wakeword-based inter- The proposed curriculum was evaluated through a ran- action, potentially with greater risk of these feared effects. domized controlled experiment performed through a Robot Human networks of social and moral norms are well Ethics class at the Colorado School of Mines in Spring 2019. known to be dynamic and malleable (Gino 2015), with This course was crosslisted so as to be offered to a wide va- norms defined, communicated, and enforced by commu- riety of students, with students able to take the course either nity members (and the technologies with which they inter- for upper-level Computer Science or Humanities credit, or act) (Verbeek 2011). As we have recently argued in our for graduate credit, with differing writing and programming

13488 requirements depending on the type of credit earned. The and thirteen non-participating controls. From this point on, assessment measures were used to evaluate efficacy of cur- the experiment was run as expected, with each undergrad- riculum. Data was collected from 32 undergraduate students uate in an experimenter or confederate role helping to run in this course (25 Male, 6 Female, 1 NA/Other). three experiment sessions each. Once the experiment was completed, the assessment quiz Procedure was administered as an ungraded pop quiz. For the purposes At the beginning of the semester, IRB exemption was ac- of this educational research, we selected from among the re- quired for both our educational research and the human- sponses given by students on the quiz a number of “key” subjects experiment described in the previous section. Once concepts, and scored each quiz based on the number of key students had completed Phase Zero of the curricular module, concepts recalled in each category (research hypotheses, re- an amendment was approved adding all students in the class search metrics, experimental design characteristics, and con- to the exempted protocol. sent procedure requirements). Coding and scoring was per- All undergraduate students in the class were then assigned formed blind with respect to students’ experimental group to one of three roles. Nine students were assigned to partic- membership. All students in the class then provided volun- ipate in the experiment as an experimenter, guiding experi- tary informed consent for the course staff to use their scores mental participants through consent procedures and debrief- on this ungraded quiz for this educational research. ing them at the end of the experiment. Ten students partici- After this quiz was administered, students in the control pated in the experiment as a wizard, controlling the robot’s group, who had not been required to participate in the run- movements behind the scenes. Finally, thirteen students did ning of the experiment, were given an annotation homework not participate in the experiment, serving as a control con- assignment, so that they could have the opportunity to be dition. We originally intended to have eleven students in involved in the research effort and to prevent differences in each of these three groups; however, after training had taken workload across students, while allowing them to still func- place, four students originally assigned to experimenter or tion as a control group for the purposes of this educational wizard roles ultimately could not participate, e.g. due to research. overly restrictive schedules (Many of our undergraduates have occasional six-class semesters due to heavy course re- Hypotheses quirements and strict course sequencing). External to these In evaluating our proposed curriculum, we sought to test the three experimental conditions, Graduate students enrolled in following research hypotheses, each of which is associated the course were assigned to participate as confederates; ac- with one of our key Learning Objectives. tors who carried out the requests of participants during the H1: Normative Influence of Technology experiment. (a) Students participating in the experimental ethics mod- Phases One and Two were then carried out in an inter- ule will achieve better understanding of how technolo- leaved fashion: all students in the class first attended the gies (like robots) can exert influence on human behav- first portion of the classroom lecture. Then, students in the iors due to their perception as moral and social agents, wizard role and the control condition stayed in class for the and better understand how this influence can carry over second portion of the lecture while students in the experi- into human-human relationships. menter role visited the lab, where graduate students led them through their version of phase two. Once this was complete, (b) This learning advantage will be especially true of stu- students in the experimenter role returned to class for the dents who participate in the Experimenter role (due to second portion of the lecture, while students in the wizard their repeated debriefing of experimental participants role visited the lab for their version of phase two, and stu- on the true focus of the research). dents in the control condition left class early. H2: Experimental Ethics Finally, in the weeks following this lecture and training, (a) Students participating in the experimental ethics mod- the experimental ethics experiment (Phase Three) was run ule will achieve better understanding of how human- by the students in the class. Initial pilots of the experiment subject experimentation can be used to explore the eth- revealed that students in the wizard condition had difficulty ical implications of technology (in this case, hypoth- accurately teleoperating the robot under the time pressures esized normative influence of decisions made during of live experimentation, and thus the experiment underwent robot interaction design.) minor revision. First, the graduate students in the class mod- (b) This learning advantage will be especially true of stu- ified the robot used during the experiment (SoftBank’s Pep- dents who participate in the Experimenter role (due to per) to perform the experiment autonomously, without tele- their repeated debriefing of experimental participants operation. This allowed participants previously designated on the true focus of the research). as wizards to instead serve as confederates; a role which re- quired very little additional training. The graduate students H3: Ethical Research Conduct in the class then, instead of serving as confederates, merely (a) Students participating in the experimental ethics mod- served as supervisors, attending experiment sessions to su- ule will achieve better understanding of ethical con- pervise undergraduate experimenters make sure that things cerns that can arise in the design and conduction of Ex- ran smoothly. This modification thus yielded three new un- perimental Ethics experiments, and how those concerns dergraduate groups: nine experimenters, ten confederates, should be addressed.

13489 (b) This learning advantage will be especially true of stu- dents who participate in the Experimenter role (due to their repeated explanation and application of consent procedures to experimental participants). Measures To test Hypothesis 1, we assessed whether the mean number of research hypotheses and metrics recalled during the in- class quiz differed according to students’ experimental role. To test Hypothesis 2, we assessed whether the mean num- ber of experimental design characteristics recalled during the in-class quiz differed according to students’ experimen- tal role. To test Hypothesis 3, we assessed whether the mean num- ber of consent procedure requirements recalled during the in-class quiz differed according to students’ experimental role. Figure 2: Observed differences in effect of experimental Results condition on recall of research hypotheses. We analyzed our results using the JASP (JASP Team 2016) software package for Bayesian statistical analysis. did have an effect on their recall of the experimental hy- A Bayesian Analysis of Variance with Bayes factor analy- potheses, thus partially supporting Hypothesis 1 and refuting sis (Morey, Rouder, and Jamil 2015) was performed to as- Hypotheses 2 and 3. specifically, students who participated sess the affect of experimental condition on mean number in the experiment as an experimenter recalled more details of key items recalled by students in each of the following of the experiment’s goals than did students who participated categories (1) research hypotheses, (2) research metrics, (3) as a confederate. experimental design characteristics, and (4) consent proce- dure requirements. Discussion As shown in Table 1, our results presented weak ev- idence against any impact of experimental condition on In this section we will discussed practical lessons learned recall of experimental metrics (BF=0.3283), consent pro- while implementing the proposed curriculum. cedures (BF=0.346), or elements of experimental design (BF=0.332), and inconclusive evidence with respect to effect Scheduling of experimental condition on recall of research hypotheses One of the major hurdles faced while implementing the pro- (BF=0.884). posed curriculum was student scheduling. While scheduling Post-hoc analysis of this inconclusive result re- is typically fairly straightforward for normal experimental vealed weak evidence against a difference between ethics experiments, this was not the case for this experiment non-participating students and confederate students due to the large number of experimenters, and due to the fact (BF=0.385), inconclusive evidence regarding a difference that most of the experimenters were undergraduate students between non-participating students and experimenter with heavy courseloads. students (BF=0.980), and moderate evidence in favor of a Because each student was required to help run three ex- difference between confederate students and experimenter perimental participants, and because each experimental ses- students (BF=6.606), with students in the experimenter role sion required two undergraduate students to run the experi- overall recalling more key hypotheses (μ=2.56, SD=0.73) ment, the course staff needed to collect from every student than students in the confederate role (μ=1.7, SD=0.48), as in the class a schedule of times at which they would be able shown in Fig. 2. to run participants, and compare those schedules to identify Taking this data together, our results suggest that the type times at which pairs of students were free, while making of experimental participation had no effect on students’ re- sure to assign students to timeslots for which few other stu- call of information regarding experimental procedure, but dents were available. After each participant was run through the experiment, the course staff then needed to take note 3A Bayes Factor of 0.328 indicates that the ratio of probabilities of which students had run that participant; once a student between the two models is 0.328 times larger when measured using had finished all three of their required experimental slots, the posterior rather than the prior, indicating that the data observed the course staff often then needed to consider who might be was 0.328 times more likely if there were a difference between able to fill in for future slots that that now-finished student the groups than if there were not; or conversely, that it the data was 1/0.328 = 3.048 times more likely to have been observed had been scheduled to run (in the case that earlier assigned if there was not a difference between groups than if there were. slots went unfilled). A Bayes Factor with a value greater than 3.0 provides moderate This entire procedure was a major effort on the part of evidence in support of the hypothesis in question: in this case, the the course staff. If the proposed curriculum were to be null hypothesis. used again, the course staff would need to ensure that a

13490 Measure Control Confederate Experimenter Experimental Metrics 1.62 (1.12) 2.0 (0.82) 1.56 (0.53) Consent Procedures 2.0 (1.35) 1.80 (0.63) 2.44 (1.13) Elements of Experimental Design 0.62 (0.51) 0.60 (0.70) 0.33 (0.5) Research Hypotheses 1.77 (1.3) 1.7 (0.48) 2.56 (0.73)

Table 1: Assessment Results. Each cell contains the mean (standard deviation) of key items in each category (rows) recalled by students in each condition (columns). software-based scheduling solution were used. Due to the greater learning gains than they would have if they had not unique scheduling needs involved for the curricular module, been involved in this way; but because the students are pub- a unique software solution would likely be required. licly named co-authors, their grades (e.g., quiz scores) can- not be analyzed without partial de-anonymization. Sample Sizes The statistical power of both the experimental ethics exper- Conclusion iment and the educational research experiment were limited We conducted a nested educational research experiment in due to small sample sizes. The small size of the course which students achieved learning objectives by taking an ac- meant that only a small number of students fell into each tive role in an experimental ethics laboratory experiment. experimental category. Moreover, this meant that because Our results suggest that the proposed curriculum yielded we only required each student to run three experimental par- no learning gains with respect to traditional lecture-based ticipants, the total number of datapoints collected in the ex- curriculum, but for students who participated in the pro- perimental ethics experiment was also smaller than desired. posed curriculum, retention of the research hypotheses ex- These issues could be ameliorated by increasing the num- plored through the experimental ethics research experiment ber of sessions required of each student, or by accepting a was greater for students who participated as an experimenter larger number of students into the course. However, both of rather than as a confederate. these solutions would have exacerbated the scheduling con- We do not believe these learning gains were sufficient cerns described above, and may have significantly added to to justify the added overhead imposed by this curriculum. the cost of the experimental ethics experiment. However, it may be value to reexamine incorporation of this curriculum after addressing some of the process-based Graduate Students lessons learned while conducting the research. Moreover, it While the implementation and analysis of our curricular may be worth incorporating the curriculum for the benefits module focused on the undergraduate students who served provided to graduate students in the course. Another option as experimenters or confederates during the experimental would be to explore the incorporation of experiment design ethics experiment, the module may actually have been more and piloting (Coker 2017): in many human-robot interaction effective for the graduate students in the class. While the courses, for example, students are required to design and pi- graduate students were only responsible for supervising the lot a human-robot interaction experiment. Because these ex- undergraduates, and did not themselves interact with any ex- periments are piloted rather than run with real participants, perimental participants, they ended up participating much they have no associated cost; and because they are designed more deeply in the experiment, for a number of reasons. by students themselves, they lead to deeper engagement with First, because a small number of graduate students were the research questions under investigation. However, this responsible for supervising all experimental sessions, each means that they also are not publishable, and thus do not graduate student ended up observing a much larger number give the opportunity for research authorship. However, it of sessions than did each undergraduate student. Second, may be worth examining the use of this pilot-oriented re- graduate students participated not just by following a script, search curriculum in AI and Robot Ethics classes for their but rather by observing and watching out for adverse events, other benefits listed above. to identify and correct for experimental problems, leading to a different level of engagement during experiments. Acknowledgments Third, because the graduate students in the course were This work was supported in part by a grant from the National so deeply involved with the management and refinement of Science Foundation (IIS-1849348) and in part by Colorado the experimental ethics experiment, all students were given School of Mines 2018 Tech Fee CS-01. The authors would (and took) the opportunity to become authors on the sci- like to extend thanks to the graduate students of Colorado entific paper that resulted, each contributing writing, fig- School of Mines’ 2019 Robot Ethics class for their assis- ures, data analysis, and/or supplemental videos, allowing for tance in supervising the experiments described in this paper. much deeper engagement with the material and a much more substantial involvement in the research process. Unfortunately, this creates a paradox from an education References research perspective: by involving graduate students in the Allen, J. E.; Guinn, C. I.; and Horvitz, E. 1999. Mixed- scientific process and in paper writing, they likely made initiative interaction. IEEE Intelligent Systems and their Ap-

13491 plications 14(5):14–23. Jackson, R. B., and Williams, T. 2019. Language-capable Braunschweiger, P., and Hansen, K. 2010. Collaborative robots may inadvertently weaken human moral norms. In institutional training initiative (citi). J Clin Res Best Pract 2019 14th ACM/IEEE International Conference on Human- 6:1–6. Robot Interaction (HRI), 401–410. IEEE. Briggs, G., and Scheutz, M. 2014. How robots can affect hu- JASP Team. 2016. Jasp. Version 0.8. 0.0. software. man behavior: Investigating the effects of robotic displays of Kardash, C. M. 2000. Evaluation of undergraduate re- protest and distress. International Journal of Social Robotics search experience: Perceptions of undergraduate interns and 6(3):343–355. their faculty mentors. Journal of educational psychology Burton, E.; Goldsmith, J.; Koenig, S.; Kuipers, B.; Mattei, 92(1):191. N.; and Walsh, T. 2017. Ethical considerations in artificial Kennedy, J.; Baxter, P.; and Belpaeme, T. 2014. Children intelligence courses. AI magazine 38(2):22–34. comply with a robot’s indirect requests. In Proceedings of Burton, E.; Goldsmith, J.; and Mattei, N. 2015. Teaching the 2014 ACM/IEEE international conference on Human- ai ethics using science fiction. In Workshops at the Twenty- robot interaction, 198–199. ACM. Ninth AAAI Conference on Artificial Intelligence. Landrum, R. E., and Nelsen, L. R. 2002. The undergraduate Burton, E.; Goldsmith, J.; and Mattei, N. 2018. How to research assistantship: An analysis of the benefits. Teaching teach computer ethics through science fiction. Communica- of Psychology 29(1):15–19. tions of the ACM 61(8):54–64. Laursen, S.; Hunter, A.-B.; Seymour, E.; Thiry, H.; and Cassell, J.; Torres, O. E.; and Prevost, S. 1999. Turn tak- Melton, G. 2010. Undergraduate research in the sciences: ing versus discourse structure. In Machine conversations. Engaging students in real science. John Wiley & Sons. Springer. 143–153. Lopatto, D. 2010. Science in solution. Tucson, AZ: Research Coker, J. S. 2017. Student-designed experiments: A ped- Corporation for Science Advancement. agogical design for introductory science labs. Journal of Martin, M. W. 2000. Meaningful work: Rethinking profes- College Science Teaching 46(5):14. sional ethics. Practical and Professional Eth. Faucet, D. 2019. Voice faucet. https://www.deltafaucet. Morey, R. D.; Rouder, J. N.; and Jamil, T. 2015. Bayesfac- com/Voice. Accessed: 2019-06-26. tor: Computation of bayes factors for common designs. R Forester, T., and Morrison, P. 1994. Computer ethics: cau- package version 0.9 9:2014. tionary tales and ethical dilemmas in computing. Mit Press. Rest, J.; Narvaez, D.; Bebeau, M.; and Thoma, S. 1999. Furey, H., and Martin, F. 2018. Introducing ethical think- A neo-kohlbergian approach: The dit and schema theory. ing about autonomous vehicles into an ai course. In Thirty- Educational Psychology Review 11(4):291–324. Second AAAI Conference on Artificial Intelligence. Strayhorn, T. L. 2010. Undergraduate research partic- Gino, F. 2015. Understanding ordinary unethical behav- ipation and stem graduate degree aspirations among stu- ior: Why people who value morality act immorally. Current dents of color. New Directions for Institutional Research opinion in behavioral sciences 3:107–111. 2010(148):85–93. Gordon, K. 2018. Alexa and the age of casual Traum, D., and Rickel, J. 2002. Embodied agents for multi- rudeness. https://www.theatlantic.com/family/archive/2018/ party dialogue in immersive virtual worlds. In Proceedings 04/alexa-manners-smart-speakers-command/558653/. Ac- of the first international joint conference on Autonomous cessed: ’19-06-26. agents and multiagent systems: part 2, 766–773. ACM. Harrison, M.; Dunbar, D.; Ratmansky, L.; Boyd, K.; and Truong, A. 2016. Parents are worried the ama- Lopatto, D. 2011. Classroom-based science research at the zon echo is conditioning their kids to be rude. introductory level: changes in career choices and attitude. https://qz.com/701521/parents-are-worried-the-amazon- CBE–Life Sciences Education 10(3):279–286. echo-is-conditioning-their-kids-to-be-rude/. Accessed: Hathaway, R. S.; Nagda, B. A.; and Gregerman, S. R. 2002. 2019-06-26. The relationship of undergraduate research participation to Verbeek, P.-P. 2011. Moralizing technology: Understanding graduate and professional education pursuit: An empirical and designing the morality of things. University of Chicago study. Jour. of College Student Development 43(5):614–631. Press. Horvitz, E. 1999. Principles of mixed-initiative user inter- Wayment, H. A., and Dickson, K. L. 2008. Increasing faces. In Proceedings of the SIGCHI conference on Human student participation in undergraduate research benefits stu- Factors in Computing Systems, 159–166. ACM. dents, faculty, and department. Teaching of Psychology Hsu, Y.-C. 2018. Designing Interactive Systems for Commu- 35(3):194–197. nity Citizen Science. Ph.D. Dissertation, Ph. D. Dissertation. Winkle, K.; Lemaignan, S.; Caleb-Solly, P.; Leonards, U.; Carnegie Mellon University, Pittsburgh, PA. Turton, A.; and Bremner, P. 2019. Effective persua- Jackson, R. B., and Wililams, T. 2019. On perceived so- sion strategies for socially assistive robots. In 2019 14th cial and moral agency in natural language capable robots. ACM/IEEE International Conference on Human-Robot In- In 2019 HRI Workshop on The Dark Side of Human-Robot teraction (HRI), 277–285. IEEE. Interaction.

13492