<<

LIAR LIAR NEURONS FIRE: HOW EXECUTIVE

CONTROL PROCESSES CONTRIBUTE TO THE

ABILITY TO DECEIVE

Ian John Watkins

A thesis submitted in fulfilment of the requirements

for the degree of Doctor of Philosophy

The University of New South Wales

Faculty of Science

School of Psychology

July 2015

THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: Watkins

First name: Ian Other name/s: John

Abbreviation for degree as given in the University calendar: PhD

School: School of Psychology Faculty: Faculty of Science

Title: Liar liar neurons fire: how executive control processes contribute to the ability to deceive

Abstract 350 words maximum: (PLEASE TYPE) This thesis presents a series of empirical investigations into the executive demands of deception. The first two experiments investigated whether the executive demands of deception are sufficient to influence receiver perceptions of credibility. Participant-senders in Study 1 (n = 52) and Study 2 (n = 97) completed a false opinion task and a battery of cognitive tasks. Deception performance was operationalized via participant-receiver judgements of veracity (Study 1, n = 624; Study 2, n = 1140). While the results from Study 1 showed a small positive relationship between executive abilities and deception performance, the results from Study 2 were stronger. They indicated that while working memory skill had a moderate positive relationship with deception performance, set shifting and inhibitory control skills were unrelated to deception performance once working memory skill had been taken into account.

The third study used a resource depletion framework to experimentally manipulate executive abilities. Participant-senders (n = 114) completed two false opinion tasks; one before the administration of a cognitive task (either an executive task designed to deplete the availability of executive resources or one of two control tasks) and the other immediately after. Once again deception performance was operationalized via participant-receiver judgements of veracity (n = 798). The results indicated that while deception performance was impaired by the executive task, it was relatively unaffected by either of the control tasks.

The fourth study presents a theoretical analysis assessing the appropriateness of standard by-judge and by-sender aggregating procedures commonly used in deception detection research. A series of Monte Carlo simulations demonstrated that the aggregation of deception data can cause inflated Type 1 error rates and poor statistical power and that Generalized Linear Mixed Models (GLMMs) may overcome these problems. Consequently, a series of GLMMs were used to re-analyze the data from Study 3. The results were consistent with previous analyses.

Overall, the evidence reported in this thesis demonstrates that the demands of deceiving in false opinion tasks are sufficient to influence a person’s behaviours such that those with poor executive abilities tend to be worse liars than those with good executive abilities.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorize University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only

………………………………………… ………………………………………… … 12/1/2016 …. Signature Witness Date The University recognizes that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

ORIGINALITY STATEMENT I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.

Signed ……………………………………………......

Date ……………… 12/1/2016 …………………......

iii

------~------· " -

COPYRIGHT STATEMENT

'I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known , subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of ::~::sisordz tion ~ .

Date ... 15/ . 2../1.~......

AUTHENTICITY STATEMENT

'I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the :::::ionto1~rrn~ · · ·· · · ······ · · · · ··· · ··· · ···· · · · ···· Date ...... /S/ . 2 . /1.~......

TABLE OF CONTENTS Page

Acknowledgements……………………………………………………………….. vi Thesis Overview………………………………………………………………….. vii List of Figures…………………………………………………………………….. viii List of Tables……………………………………………………………………... ix

SECTION 1: INTRODUCTION

Chapter 1 Good Liars and Poor Liars……………………………………………………...… 1

SECTION 2: CRTICAL REVIEWS

Chapter 2 The Influence of Sender Motivation……………………………………………… 14 Summary………………………………………………………………………….. 36 Chapter 3 The Influence of Personality Traits and Social Skills……………………………. 43 Summary………………………………………………………………………….. 55 Chapter 4 The Cognition of Deception……………………………………………………… 58 General Summary………………………………………………………………… 72 Thesis Aims………………………………………………………………………. 75

SECTION 3: LABORATORY STUDIES

Chapter 5 Study 1 – Investigating the Executive Demands of Deception……………...…… 78 Method……………………………………………………………………………. 79 Results…………………………………………………………………………….. 87 Discussion………………………………………………………………………… 101 Chapter 6 Study 2 – Controlling Measurement Error: Reinvestigating the Executive Demands of Deception…………………………………………………………… 108

iv

Method……………………………………………………………………………. 111 Results…………………………………………………………………………….. 121 Discussion………………………………………………………………………… 135 Chapter 7 Study 3 – Impairing Deception Performance by Depleting Working Memory….. 141 Method……………………………………………………………………………. 150 Results…………………………………………………………………………….. 156 Discussion………………………………………………………………………… 164 Chapter 8 Study 4 – Examining the use of Statistics in Deception Research……………...... 167 Reanalysis of Study 3 Data……………………………………………………….. 180 Conclusion…………………….…………………………………………………. 183

SECTION 4: GENERAL DISCUSSION

Chapter 9 Good Liars and Poor Liars…….………………………………………………...... 186 Implications for Deception Theory and Research………………………………... 193 Practical Implications…………………………………………………………….. 197 Significance and Innovation……………………………………………………… 198 Limitations and Future Directions………………………………………………... 199 Conclusions……………………………………………………………………….. 202

REFERENCES…………………………………………………………………... 203 APPENDICES…………………………………………………………………… 220 Appendix A………………………………………………………………………. 220 Appendix B………………………………………………………………………. 223 Appendix C………………………………………………………………………. 224 Appendix D………………………………………………………………………. 228 Appendix E………………………………………………………………………. 231 Appendix F………………………………………………………………………. 233 Appendix G………………………………………………………………………. 235

v

______

Acknowledgements Writing a Doctoral thesis is a challenging task, one that I could not have completed without support from some key people. I would first like to express my gratitude to my supervisor, Dr. Kristy Martire, whose intellectual contributions to this thesis were invaluable. Throughout the process, Kristy provided thoughtful comments regarding my research rationale and experimental methodology. She was never too busy to discuss my ideas or to meet with me when I was having difficulty resolving an issue. Kristy taught me to focus on the big picture and challenged me to engage with the theoretical and practical significance of my research. Without doubt, her insights and contributions greatly increased the scientific rigor of this thesis. More importantly, while her expertise in forensic psychology was invaluable, Kristy’s passion for high- quality psychological research was infectious and drove me to do the best work I could. Thanks, Kristy! I would also like to thank my parents, Alan and Sheila Watkins, whose editorial and emotional support during the writing process was instrumental to the completion of this thesis. They were kind enough to proofread my entire thesis and helped me better my understanding of comma usage, sentence structure, split infinitives, and dangling participles, as well as other elusive grammatical rules. They also always made time to discuss how I was going (often at length) and helped redirect my focus towards the finishing line. More importantly, they provided unwavering support and encouragement during my formative years (and beyond), without which I would not have had the confidence to follow my intellectual pursuits and take ownership regarding the direction of my life. For this, I will be forever in their debt. Thanks, mum and dad! Finally, I would like to give special thanks to my beautiful wife and best friend, Jennifer Watkins, whose continued love and support fuelled me when things seemed hardest. Without complainant, Jennifer endured countless hours of unintelligible rambling as I struggled to reason through some of the more complex ideas in this thesis. She provided me with inspiration when I was uninspired, motivation when I was unmotivated, and focus when I was unfocused. She also provided a pair of ears on which to vent my frustration and a kind word when I felt overwhelmed. Most importantly, she provided me with caffeine when I was tired (many thanks for this one). I doubt that I will ever be able to convey my appreciation fully, but I will endeavour to do so each and every day. Thanks, Jen! vi

______

Thesis Overview Psychologists, legal authorities, and security personnel have a vested interest in developing and validating effective lie detection methods. The development of these methods, however, must be guided by evidence-based scientific theories regarding the psychology of deception. To this end, the current thesis presents a series of empirical investigations assessing the functional contribution of executive control processes to the production of successful deceptive communications. At the outset, this thesis discusses the different theoretical perspectives as to why liars may differ from truth-tellers and the factors people generally need to overcome in order to lie successfully. Chapter 1 also presents the empirical evidence regarding the behavioural correlates of deception, as well as the evidence regarding people’s ability to detect deception and avoid detection. Chapters 2 and 3 critically review the literature pertaining to individual differences factors that are thought to influence the ability to deceive (sender motivation and sender personality traits/social skills, respectively). Chapter 4 discusses cognitive models of deception. Chapters 5 through 7 present a series of experimental studies designed to assess whether, and to what extent, executive control processes contribute to the ability to deceive. These experiments are composed of three distinct parts. In part 1, participant- senders are video and audio recorded while providing both true and false opinions. In part 2, the same participant-senders complete a battery of tasks designed to measure/manipulate somewhat diverse aspects of their executive abilities. In part 3, participant-receivers evaluate the messages collected in part 1 for veracity. Chapter 8 presents a theoretical analysis evaluating the appropriateness of standard by-judge and by-sender aggregating procedures commonly used in deception detection research and introduces an alternate method that overcomes the problems inherent in the traditional methods. It also presents the results from a series Monte Carlo simulation studies contrasting the relative utility of the different statistical approaches. The data from the third study are then re-analysed using the alternate method. The final chapter summarises the results from the four empirical studies and presents a general discussion regarding the implications of the findings. It also discusses the significance and innovation of the work, as well as its limitations. The final chapter concludes with a review of the aims of the thesis and makes recommendations regarding future directions for the field. vii

LIST OF TABLES Page

Table 1: Summary of Results from Two Hypothetical Deception Detection Experiments…………………………………………………………. 12 Table 2: Idealized Comparison of Effect Sizes to Infer Partial Correlations... 42 Table 3: Inter-Correlations Between Messages………………………………. 98 Table 4: Descriptive Statistics for the Executive Control Tasks……………… 99 Table 5: Zero-Order Correlations Between the Executive Control Tasks and Detectability Scores…………………………………………………. 99 Table 6: Summary of Multiple Regression Analyses…………………….…… 100 Table 7: Means and 95% Confidence Intervals for Opinion Items…………… 101 Table 8: Descriptive Statistics for the Executive Task Scores used to Estimate Latent Executive Abilities………….…………..………….. 130 Table 9: Pearson Correlation Coefficients Between the Executive Task Scores and Detectability Scores………..……………………….…… 130 Table 10: Fit Indices for the Structural Equation Models Predicting Naïve Detectability Scores…………………………………………………. 133 Table 11: Fit Indices for the Structural Equation Models Predicting Informed Detectability Scores……..…………………………………..………. 134 Table 12: Descriptive Statistics for the Working Memory Scores……………... 162 Table 13: Zero-Order Correlations Between the Working Memory Scores and the Amount of Impairment Observed in the OSPAN Condition……. 162 Table 14: Summary of Multiple Regression Analysis………………………… 163 Table 15: Example Decision-Level Dataset from a Hypothetical Deception Detection Experiment……………………………………….…….… 168 Table 16: Example Decision-Level Dataset used in Monte Carlo Simulations…..………………………………………………………. 235

viii

LIST OF FIGURES Page

Figure 1: Distribution of the decision variable across truth and lie trials….. 89 Figure 2: Mean emotional activation scores by condition……………….…. 95 Figure 3: Mean cognitive load scores by condition…………………………. 96 Figure 4: Mean behavioural control scores by condition…………..…….…. 97 Figure 5: Mean emotional activation scores (original self-report measure) by condition……………………………………………………….. 123 Figure 6: Mean emotional activation scores (supplementary self-report measure) by condition…………………………………………….. 124 Figure 7: Mean cognitive load scores (original self-report measure) by condition…………………………………………………………... 125 Figure 8: Mean cognitive load scores (supplementary self-report measure) by condition……………………………………………………….. 126 Figure 9: Mean behavioural control scores (original self-report measure) by condition…………………………………………………………... 127 Figure 10: Mean behavioural control scores (supplementary self-report measure) by condition……………………………………………... 128 Figure 11: The full three-factor model used in the confirmatory factor analysis……………………………………………………………. 131 Figure 12: Mean number of words correctly recalled by condition and task administration……………………………………………………... 156 Figure 13: Mean ratings of task difficulty by condition and task administration…………………………………………………….. 158 Figure 14: Mean BMIS scores by condition and task administration………… 159 Figure 15: Mean accuracy scores by task type and message setting………..... 160 Figure 16: Empirical Type 1 error rates by statistical method……………….. 177 Figure 17: True discovery rates by statistical method………………………… 179 Figure 18: Probability of a correct identification by task type and message setting……………………………………………………………… 182

ix

Chapter 1: Good liars and poor liars ______

SECTION 1: INTRODUCTION Chapter 1 Good Liars and Poor Liars According to Dawkins (1976), deception is one of the most pervasive traits observed among evolved species. In his seminal text he describes several case examples of animal deception, ranging from butterflies that mimic the external appearance of other distasteful or stinging insects in an attempt to fool predators, to the remarkable ability of the cuckoo to disguise its eggs such that the host species has difficulty discriminating between the cuckoo’s eggs and their own. Dawkins concludes that deceit is a fundamental component of animal communication, arguing that the widespread selection of deceptive traits provides strong evidence for the adaptive value of deception. While these examples of animal communication may have deceptive elements, Dawkins is careful to distinguish between effects that have the functional equivalence of deception and the conscious intention to deceive. Indeed, the scientific study of deception requires a clear distinction between deceptive and non-deceptive acts. Over the years, deception has been defined in many different ways in the scientific literature. Vrij (2008) defined deception as “a successful or unsuccessful deliberate attempt, without forewarning, to create in another a belief which the communicator considers to be untrue” (p. 15). An important feature of this definition is that it defines deception from the perspective of the communicator. This means that under this definition many simple acts of seemingly deceptive animal communication are no longer considered as deceptive, such as the false markings on the wings of Dawkins butterflies. Clearly the butterflies do not have a conscious intention to mislead potential predators. In the case of human deception, this means that the conveyance of false information is also not considered as deceptive, provided the communicator is unaware that the information is incorrect. Simply stated, deception must be an intentional act to mislead. With this definition in hand, it follows that the opportunities to create in another a belief which the communicator considers to be untrue increase with the number and sophistication of communicative channels. This has implications for the study of human deception as we have many ways in which we can deceive. Indeed, Wright (1994) argued that the large communicative potential of humans gave deception a firm footing in natural selection, stating that “We are far from the only dishonest species, but we are 1

Chapter 1: Good liars and poor liars ______surely the most dishonest, if only because we do the most talking” (p. 265). This quote highlights the complexity of human deception, with the possibility for deception to be encoded in verbal as well as nonverbal behaviours. Psychologists have studied many aspects of human deception, including the age at which deception first appears (Polak & Harris, 1999; Sodian, Taylor, Harris, & Perner, 1991; Talwar & Lee, 2002), the reasons why people lie (DePaulo, Kashy, Kirkendol, Wyer, & Epstein, 1996; Kashy & DePaulo, 1996), how frequently they lie (Cole, 2001; Serota, Levine, & Boster, 2010; Tyler, Feldman, & Reichert, 2006), and the types of people they tend to deceive (DePaulo & Kashy, 1998). In particular, psychologists, legal authorities, and security personnel have been specifically interested in the ability to detect lies (see Vrij (2008) for an overview of several different lie detection methods). While these methods have met with varying degrees of success, many of the approaches have been criticised for lacking a clear and reliable theoretical rationale (National Research Council, 2003; Vrij, 2008). This has led to a resurgence of research investigating the psychology of deception.

Good Liars and Poor Liars: Theoretical Perspectives In order to understand why some people may be better at lying than others, we must first consider what hinders liars and what they need to overcome in order to lie successfully. The most influential theoretical perspective as to why liars may differ from truth-tellers is Zuckerman, DePaulo, and Rosenthal’s (1981) multi-factor model of deception. They argued that while no one behaviour, or set of behaviours, would ever occur solely during deception, the behavioural displays of liars may systematically differ from those of truth-tellers because liars tend to differentially experience three factors1. Specifically, they argued that liars, relative to truth-tellers, tend to experience higher levels of (1) emotional arousal and (2) cognitive load, which, if left untempered, manifest behaviourally. For instance, a liar who is afraid of getting caught may manifest signs of fear, such as increased perspiration, a trembling/shaking in their extremities, and/or a quaver in their tone of voice. These behavioural manifestations, or cues to deception, may signal to observers that a person is lying. The liar may be aware of such

1 In their original article, Zuckerman et al. (1981) distinguished between generalized physiological arousal and specific affective arousal. For the purposes of this thesis, both will be discussed in terms of emotional arousal. 2

Chapter 1: Good liars and poor liars ______cues, however, and thus try to moderate them via willful (3) behavioural control so as to conceal their deception. In the case of the fearful liar, they may fold their arms and/or legs so as to minimise any observable trembling and/or try to smooth their tone of voice in an attempt to appear calm and collected. Importantly, this willful behavioural control may not always be successful, with underlying internal states thought to leak out, often through nonverbal behaviours (DePaulo, 1992; Ekman & Friesen, 1969). While Zuckerman et al.’s (1981) multi-factor theory predicts that the more liars experience emotional arousal, cognitive load, and/or behavioural control the more likely they are to manifest cues to deception, DePaulo (1992) reasoned that these processes are not unique to deceptive communication. In the Self Presentational Perspective (SPP; DePaulo, 1992), people are thought to routinely engage in impression management, often seeking to foster a certain perception of themselves in the minds of others by purposefully regulating their expressive behaviours. Impression management, however, is thought to be a difficult task with the association between the intended perception and that actually experienced by the conversational partner contingent upon one’s ability to accurately produce the desired behavioural displays and whether the displays actually foster the intended perception. According to DePaulo, verbal behaviours may be easily controlled and are mutually experienced by both parties, thus the purposeful regulation of verbal behaviours often facilitates one’s self-presentational goals. Some nonverbal behaviours, on the other hand, are thought to be more difficult to control as they are typically inaccessible to those who produce them. DePaulo argued that people are generally less aware of the perceptions that their nonverbal behaviours create in others as they rarely have the opportunity to observe their own face or body during social interactions. DePaulo also argued that even if a person could see and hear their nonverbal behaviours as others do, they may still not be able to accurately produce the desired nonverbal expression as it may be subject to automaticity, with the precise components of the expression unknown to the conscious mind. Furthermore, DePaulo argued that certain nonverbal behaviours may be intrinsically linked to certain emotions, making the production of these nonverbal expressions difficult in the absence of emotion or the suppression of these nonverbal expressions difficult in the presence of emotion. For these reasons, DePaulo concluded that the purposeful regulation of nonverbal behaviours often impairs one’s self-presentational goals. In the context of deception, the SPP embraces Zuckerman et al.’s (1981) multi-factor model, with liars

3

Chapter 1: Good liars and poor liars ______thought to experience higher levels of emotional arousal, cognitive load, and behavioural control than truth-tellers. This is thought to occur because liars are less likely to take their credibility for granted (Vrij, 2008). While the SPP considers the task of deception from the perspective of the communicator, Interpersonal Deception Theory (IDT; Buller & Burgoon, 1996) emphasizes the active nature of social interactions, arguing that people are capable of adapting their verbal and nonverbal presentations based on interpersonal feedback. Contrary to the SPP, IDT claims that the willful control of verbal and nonverbal behaviours is not only possible, but that it often serves to facilitate interpersonal goals. Specifically, IDT asserts that when people want to be perceived as credible they tend to pay more attention to their conversational partner’s cues of suspicion and actively modify their behaviour in response to such information. This behavioural adaptation is thought to assist in allaying the conversational partner’s suspicions. In the context of deception, IDT also embraces Zuckerman et al.’s (1981) multi-factor model, however it predicts that the net output of liars differential experiencing of emotional arousal, cognitive load, and behavioural control tends to be positive rather than negative. According to Vrij (2008), whether, and to what extent, liars actually experience emotional arousal, cognitive load, and/or behavioural control depends on the characteristics of the liar and the circumstances under which the lie takes place. This implies that the mere fact that somebody lies does not necessarily mean that they will manifest behavioural cues to deception. It does suggest, however, that the more liars experience one or more of these underlying factors the more likely cues to deception are to occur. With regard to individual differences in the ability to deceive, the multi-factor theory, the SPP, and IDT all predict that people who experience less emotional arousal and/or cognitive load while lying would manifest fewer diagnostic deception cues and would therefore be more likely to succeed with their lies. Furthermore, all three theories predict that people who are more able to regulate emotional arousal and/or cognitive load while lying would also be more likely to succeed with their lies, as would those who are more able to wilfully control their behavioural displays and match them to the beliefs of their conversational partner. The interaction between the demands of the deceptive situation and a person’s ability to regulate these demands will be discussed later in this chapter.

4

Chapter 1: Good liars and poor liars ______

Good Liars and Poor Liars: Behavioural Correlates and Credibility Judgements There has been a staggering amount of empirical research investigating whether people actually behave differently when they are lying relative to when they are telling the truth. While Zuckerman et al. (1981) were the first to quantitatively summarize this research; several more comprehensive summaries have been published since. In particular, DePaulo et al. (2003) combined the results of 1,338 estimates of 158 cues to deception. They reported that liars tend to be less forthcoming and tell less compelling stories than truth-tellers. They also reported that liars tend to make more negative impressions and tend to appear more nervous and tense than truth-tellers. Importantly, while some of the behavioural correlates used to assess these global constructs were statistically significant, the effect sizes tended to be small, with most of the cues examined in the meta-analysis showing weak and non-significant relationships with deception. In other words, no one cue emerged as highly diagnostic of deception. The finding that liars tend to manifest only weak cues to deception is further supported by detection studies where participants are presented with a selection of truthful and deceptive messages and tasked with discerning truth from lie. Detection studies typically involve the recruitment of two independent samples of participants; a sample of senders (participants tasked with producing truthful and/or deceptive messages) and a sample of receivers (participants tasked with evaluating the veracity of sender messages). When these types of studies are conducted it is common for the average receiver accuracy rate (the proportion of truths and lies correctly identified by receivers) to be only slightly above chance performance (Levine, 2010; Vrij, 2008). Specifically, Bond and DePaulo (2006) quantitatively summarized the results of 384 independent samples, reporting that the average receiver accuracy rate was 54%. They also reported that receivers tend to have higher accuracy rates for truthful messages than for deceptive messages, with receivers correctly identifying 61% of truthful messages and only 47% of deceptive messages. This is thought to occur because people tend to have a bias towards judging others as truthful, an effect that has been called ‘the veracity effect’ (Levine, Park, & McCornack, 1999). It is important to point out that the evidence from detection studies provides only indirect support for the notion of limited deception cues. It may be the case that receivers’ beliefs about sender behavioural displays are misguided, causing them to use behavioural displays that are unrelated to deception rather than those that are actually

5

Chapter 1: Good liars and poor liars ______associated with deception (Levine, 2010). For the purpose of the current thesis, the connection between sender behavioural displays and how they are interpreted by receivers will be referred to as the diagnosticity of behavioural displays. In the context of deception research, the diagnosticity of behavioural displays usually refers to how well certain behaviours discriminate between truthful and deceptive messages (the rate at which certain behaviours occur more/less often during deception than truth-telling). For instance, in the event that liars take longer to initiate their responses than truth- tellers, the diagnosticity of response latencies would usually be regarded as high (response latencies discriminate between truthful and deceptive messages). As the primary focus of the current thesis is on the connection between behavioural displays and how they are interpreted by receivers, however, the diagnosticity of response latencies would only be high if receivers are sensitive to response latencies and correctly associate longer lag times with deception. In other words, while behavioural differences are a necessary condition for high diagnosticity (there must be certain behaviours that occur more/less often during deception than truth-telling), in isolation their occurrence is not a sufficient condition for high diagnosticity. Behavioural displays only become diagnostic when they are correctly identified by receivers and result in more accurate veracity judgements. While the evidence indicates that average receiver accuracy rates tend to be only slightly better than chance (Bond & DePaulo, 2006; Levine, 2010; Vrij, 2008), recent research has focused on the variability in accuracy rates. That is, recent research has focused on whether some receivers are better at discriminating between truthful and deceptive messages than others. In an attempt to estimate the extent to which accuracy rates vary among receivers, Bond and DePaulo (2008) conducted another comprehensive meta-analysis of the deception literature. They argued that the true variance in receiver accuracy rates could be estimated by a statistical model which relates each study’s standard deviation score (in terms of the standard deviation in receiver accuracy rates) to the number of sender messages judged in each study. They reported that after correcting for the nominal differences introduced by random measurement error, the measurement-corrected standard deviation (in terms of percentage correct scores) for receiver accuracy was 0.80%. This estimate suggests that the ability to discriminate between truthful and deceptive messages hardly varies from

6

Chapter 1: Good liars and poor liars ______person to person. That is, the estimate suggests that there are minimal individual differences in the ability to detect deception. In addition to estimating the true variability in receiver accuracy rates, Bond and DePaulo (2008) also estimated the true variability in sender accuracy rates. While receiver accuracy rates refer to the judgements made by each receiver, sender accuracy rates refer to the judgements that apply to each sender. In other words, sender accuracy rates refer to how well each sender’s truthful and deceptive messages were discriminated between by receivers. Bond and DePaulo (2008) reported that after correcting for random measurement error, the measurement-corrected standard deviation (in terms of percentage correct scores) for sender accuracy was 5.49%. This estimate suggests that the rate at which senders have their truthful and deceptive messages discriminated between by receivers does vary from person to person. That is, the estimate suggests that some people are better liars than others. While the results regarding the variability in accuracy rates indicated that there are larger individual differences in the ability to deceive than in the ability to detect deception, Bond and DePaulo (2008) also estimated the variability in both judge and sender bias rates. Whereas accuracy refers to the rate at which judgements are correct, bias refers to the rate at which judgements occur in one direction (the aforementioned veracity effect). Bond and DePaulo (2008) reported that after correcting for random measurement error, the measurement-corrected standard deviation for judge bias (in terms of the percentage of messages classified as truthful) was 5.13%. This estimate suggests some people have a tendency to believe messages while others have a tendency to disbelieve messages. While this estimate was moderate in size, the reported measurement-corrected standard deviation for sender bias (in terms of the percentage of messages classified as truthful) was 11.58%, over twice as large as the estimate for judge bias. This suggests that some people have a strong tendency to be believed while others have a strong tendency to be disbelieved. Bond and DePaulo (2008) concluded that their meta-analytic estimates demonstrate that the outcome of a veracity judgement depends more on sender characteristics than on receiver characteristics. Further evidence that sender effects are influential sources of variation in veracity judgements comes from Levine et al. (2011). In a series of experiments they systematically varied which senders/messages different samples of receivers evaluated. To manipulate sender demeanour, a pilot panel of lay receivers viewed 44 recorded

7

Chapter 1: Good liars and poor liars ______messages (22 lies and 22 truths) and made veracity judgements. The resulting decision- level data was then aggregated by-sender (the receiver judgements that corresponded to each sender were averaged) and the participant-level accuracy scores used to rank-order the truth-tellers and liars. These ranks were then used to create different stimulus sets. The demeanour-veracity matched set contained the five most often believed truth-tellers and the five least frequently believed liars, whereas the demeanour-veracity mismatched set contained the five least frequently believed truth-tellers and the five most often believed liars. These sets were then shown to five different samples of receivers; two under-graduate student samples, a faculty member sample from a U.S. university, a student sample from a university in Seoul, South Korea, and a sample of practising deception detection experts employed by a U.S. security and intelligence agency. The results revealed that, without exception, the demeanour-veracity matched set produced the highest receiver accuracy rates (ranging from 70.7% to 100% accuracy) while the demeanour-veracity mismatched set produced the lowest receiver accuracy rates (ranging from 20.4% to 41.4%). This means that the senders who were perceived to be credible by one sample of receivers were also perceived to be credible by another sample of receivers and that this effect was somewhat independent of actual message veracity. When a credible sender’s message was truthful the receivers tended to correctly identify the message. This was because the sender’s demeanour matched the veracity of their message. When a credible sender’s message was deceptive, however, the receivers tended to incorrectly identify the message. This was because the sender’s demeanour did not match the veracity of their message. These results are consistent with the findings by Bond and DePaulo (2008) and demonstrate that sender effects have a powerful impact on veracity judgements. While there has been a considerable amount of research investigating the average differences between truth-tellers and liars (in terms of behavioural differences and detection rates), far less research has adopted an individual differences perspective. That is, there has been little research investigating the factors that influence whether or not a person will display cues to deception and the factors that make one person a better liar than another. This is surprising as the results reported by Bond and DePaulo (2008) and Levine et al. (2011) demonstrate that sender characteristics are central to the outcome of a veracity judgement. The following section examines how the ability to

8

Chapter 1: Good liars and poor liars ______deceive may be influenced by a sender’s ability to regulate the demands associated with truth-telling and deception.

Good Liars and Poor Liars: The Interaction Between Task Demands and Sender Abilities In an experimental context, the ability to deceive is usually operationalized by taking the difference between two credibility ratings. One of these ratings reflects how credible the sender was rated in a truthful condition while the other reflects how credible they were rated in a deceptive condition. The difference between these two rates is commonly referred to as a detectability score. Theoretically, each of these ratings is influenced by two interacting factors; (1) the demands associated with the particular condition and (2) the sender’s ability to regulate these demands2. To clarify how the interaction between these two factors may ultimately influence the ratings for a given condition, it is helpful to consider the results from a hypothetical experiment. The hypothetical experiment comprises a 2 x 2 mixed design with one between-subjects factor (cognitive ability; low vs. high) and one within-subjects factor (condition; truthful vs deceptive). To simplify the example the truthful and deceptive conditions only differ in one respect, with the deceptive condition inducing higher levels of cognitive load than the truthful condition. The between-subjects factor refers to the ability to distribute cognitive load, with the high ability group more able to distribute cognitive load than the low ability group. The experiment is framed from the perspective of the sender (the sender scores are the unit of analyses). In the hypothetical experiment, the truthful condition induced only a small amount of cognitive load. As a result, the low ability group completed the condition without observable signs of cognitive strain3. Similarly, the high ability group also completed the condition without observable signs of cognitive strain. As the demands of the truthful condition were insufficient to cause any observable differences between the two groups the receivers were unable to discriminate between the two sets of truthful messages. Consequently, both groups received the same average rating for the truthful condition (e.g. five). Importantly, this meant that when the cognitive demands

2 Credibility ratings are not exclusively influenced by these factors. 3 The precise mechanism whereby task demands interact with sender abilities to determine task performance is discussed in greater detail in Chapter 4. 9

Chapter 1: Good liars and poor liars ______associated with the condition were small, the ability to distribute cognitive load was unrelated to the ratings (the low ability group had the same average rating as the high ability group)4. In contrast to the truthful condition, the deceptive condition of the hypothetical experiment induced a moderate amount of cognitive load. As a result, the low ability group were unable to complete the deception without observable signs of cognitive strain. Importantly, the receivers were sensitive to these signs of cognitive strain and correctly associated them with deception (the diagnosticity of the behavioural cues was high). This meant that the low ability group’s average rating for the deceptive condition (e.g. four) was lower than their average rating for the truthful condition. While the low ability group were impaired by the moderate amount of cognitive demand associated with the deceptive condition, the high ability group were still able to distribute the demands associated with the condition without displaying observable signs of cognitive strain, just as they had been able to do in the truthful condition. Consequently, the receivers were unable to discriminate between their truthful and deceptive messages. They therefore received the same average rating for the deceptive condition (five) as they received for the truthful condition. Importantly, this meant that when the cognitive demands of the condition were moderate, the ability to distribute cognitive load was positively related to the ratings (the low ability group had a lower average rating than the high ability group). This hypothetical experiment demonstrates how the demands associated with a given condition interact with the relevant sender abilities to ultimately influence the credibility ratings. Specifically, when the cognitive demands of the condition were small (in the truthful condition), both groups were able to distribute the cognitive load without observable signs of cognitive strain. As a result, the ability to distribute cognitive load was unrelated to the ratings for the condition. When the cognitive demands of the condition were moderate (in the deceptive condition), however, the low ability group were no longer able to distribute the cognitive load without observable

4 Strictly speaking there would only be no relationship between the ability to distribute cognitive load and the ratings for a given condition if the ability to distribute cognitive load was entirely unrelated to the factors that actually did influence the ratings (the ability to distribute cognitive load had no indirect effects). For the sake of clarity the potential for indirect effects is not discussed here and all covariances are assumed to be zero. 10

Chapter 1: Good liars and poor liars ______signs of cognitive strain while the high ability group were. Consequently, the ability to distribute cognitive load was positively related to the ratings for the condition. This is important because it is the difference between these two zero-order relationships that determines the degree to which sender abilities are related to detectability scores. In the hypothetical experiment, the low ability group had an average detectability score of one (their average rating for the truthful condition was five while their average rating for the deceptive condition was four) while the high ability group had an average detectability score of zero (their average rating for the truthful condition was five, as was their average rating for the deceptive condition). The difference between these scores represents the difference between two zero-order relationships (whether the ability to distribute cognitive load was more strongly related to the ratings for the deceptive condition than to the ratings for the truthful condition). In the hypothetical experiment presented here, the low ability group has a larger average detectability score than the high ability group, thus the ability to distribute cognitive load was negatively related to the detectability scores. Importantly, the fact that the deceptive condition in the hypothetical experiment induced more cognitive load than the truthful condition does not necessarily mean that the ability to distribute cognitive load would be related to the detectability scores. To clarify this point it is helpful to consider the results from another hypothetical experiment. While the design of the new experiment was the same as the old experiment, the truthful condition of the new experiment induced a moderate amount of cognitive load, similar to the deceptive condition of the old experiment. Accordingly, the low ability group displayed the same amount of observable signs of cognitive strain in the truthful condition of the new experiment as they displayed in the deceptive condition of the old experiment. As a result, the receivers were unable to discriminate between the two sets of messages, with the low ability group receiving the same average rating for the truthful condition of the new experiment (four) as they received for the deceptive condition of the old experiment. While the low ability group were impaired by the moderate amount of cognitive demand, the high ability group were able to complete the truthful condition of the new experiment without observable signs of cognitive strain, just as they had the deceptive condition of the old experiment. They therefore received the same average rating (five) that they previously received. Importantly, like the deceptive condition in the old experiment, when the cognitive

11

Chapter 1: Good liars and poor liars ______demands of the condition were moderate, the ability to distribute cognitive load was positively related to the ratings (the high ability group had a higher rating than the low ability group). In addition to inducing more cognitive load in the truthful condition than the old experiment, the new experiment also induced more cognitive load in the deceptive condition. Specifically, the deceptive condition of the new experiment induced a large amount of cognitive load. As a result, the low ability group displayed even more behavioural signs of cognitive strain than they did in the deceptive condition of the old experiment. They therefore received an even lower average rating (three) for the deceptive condition of the new experiment. In contrast to the old experiment, the high ability group were unable to complete the deceptive condition of the new experiment without observable signs of cognitive strain. This meant that that the receivers were able to discriminate between these two sets of messages, thus the high ability group received a lower average rating (e.g. four) for the deceptive condition of the new experiment than they received for the deceptive condition of the old experiment. Importantly, when the cognitive demands of the condition were large, the ability to distribute cognitive load was positively related to the ratings. The results from both the hypothetical experiments are summarized in Table 1.

Table 1 Summary of Results from Two Hypothetical Deception Detection Experiments Average Ratings of Credibility First (Old) Hypothetical Second (New) Hypothetical

Experiment Experiment Truthful Deceptive Truthful Deceptive Condition Condition Condition Condition (Small (Moderate (Moderate (Large Cognitive Cognitive Cognitive Cognitive Load) Load) Load) Load) Low 5 4 4 3 Ability High 5 5 5 4 Ability Difference Between the Two Difference Between the Two

Zero-Order Relationships Zero-Order Relationships (5 – 5) – (5 – 4) = - 1 (5 – 4) – (4 – 3) = 0 Note: When the low ability group has a larger zero-order relationship than the high ability group, the ability to distribute cognitive load is negatively related to the detectability scores. When there is no difference between the two zero-order relationships, however, the ability to distribute cognitive load is unrelated to the detectability scores. 12

Chapter 1: Good liars and poor liars ______

In the new hypothetical experiment, the ability to distribute cognitive load was unrelated to the detectability scores. This is because the zero-order relationship for the truthful condition is the same as the zero-order relationship for the deceptive condition. Specifically, the low ability group had a detectability score of one (their average rating for the truthful condition was four while their average rating for the deceptive condition was three), as did the high ability group (their average rating for the truthful condition was five while their average rating for the deceptive condition was four). While the high ability group were perceived as more credible than the low ability group in both the truthful and deceptive conditions, the ability to distribute cognitive load was equally related to the ratings for the truthful condition as it was to the ratings for the deceptive condition. That is, the high ability group had the same mean detectability score as the low ability group. This demonstrates that just because a deceptive condition places greater demands on sender abilities than a truthful condition, it does not necessarily mean that the relevant sender abilities will be related to the detectability scores. Rather, it is the difference between the two zero-order relationships that determine the degree to which sender abilities are related to detectability scores. While deceptive abilities are theoretically influenced by the sender’s ability to regulate the demands associated with truth-telling and deception, few studies have empirically examined this relationship. The next few chapters of this thesis present a critical review of the research that has been conducted in this area. Specifically, Chapter 2 reviews the research pertaining to the effect of sender motivation, Chapter 3 reviews the research pertaining to the effect of personality traits and social skills, and Chapter 4 reviews the research pertaining to the cognition of deception.

13

Chapter 2: The influence of sender motivation ______

SECTION 2: CRITICAL REVIEWS Chapter 2 The Influence of Sender Motivation Previous research has investigated whether highly motivated liars are more successful in their deception attempts than less motivated liars. It is important to review this research because there are conflicting theoretical predictions about when high levels of sender motivation affect behavioural displays (either during deception, or during deception and truth-telling) and the consequences of any such changes in behavioural displays (whether changes result in increased or decreased receiver judgements of credibility). It is also important to review this research because the relationship between sender motivation and the ability to deceive concerns the differential experiencing/regulation of emotional arousal, cognitive load, and/or behavioural control (highly motivated senders are thought to experience more of these factors). These are important considerations when the aim of research is to investigate the factors that contribute to the ability to deceive, as it is in this thesis. The majority of the research investigating whether sender motivation is related to the ability to deceive has been conceptualized and interpreted through the lens of DePaulo’s (1992) Self-Presentational Perspective (SPP). The SPP predicts that highly motivated liars attempt to control their expressive behaviours more than less motivated liars, resulting in more/greater nonverbal cues to deception. Furthermore, the SPP contends that highly motivated liars tend to have more fear of being caught than less motivated liars, resulting in higher levels of emotional arousal. The SPP also claims that highly motivated liars tend to put more effort into telling their stories and may monitor the reactions of receivers more carefully to assess their level of suspicion than less motivated liars, resulting in higher levels of cognitive load. These differences (higher levels of emotional arousal, cognitive load, and behavioural control) are thought to increase the diagnosticity of behavioural displays during deception, with the deceptive messages of highly motivated liars predicted to be more detectable than those produced by less motivated liars. This effect has been labelled the Motivational Impairment Effect (MIE; DePaulo, Lanier, & Davis, 1983). A limitation of the literature on the MIE is that it focuses on the influence of sender motivation during instances of deception and overlooks the influence of sender

14

Chapter 2: The influence of sender motivation ______motivation during instances of truth-telling5. In certain situations, truth-tellers may also be highly motivated to foster a credible impression. When this is the case, truth-tellers may tend to control their expressive behaviours in an attempt to convince the receiver that they are indeed telling the truth. Highly motivated truth-tellers may also have more fear over being falsely disbelieved than less motivated truth-tellers, resulting in higher levels of emotional arousal. Just as a highly motivated liar might do, highly motivated truth-tellers may also put more effort into telling their stories and may monitor the reactions of the receiver more carefully to assess their level of suspicion than less motivated truth-tellers, resulting in higher levels of cognitive load. The motivated liar, however, may still experience higher levels of cognitive load than the motivated truth- teller due to the additional cognitive tasks/heavier processing requirements associated with producing a deceptive message (Sporer & Schwandt, 2006; Sporer & Schwandt 2007; Zuckerman et al., 1981)6. Contrary to the MIE literature, Buller and Burgoon’s (1996) Interpersonal Deception Theory (IDT) predicts that when sender motivation is high, senders tend to successfully exert more control over their expressive behaviours, resulting in higher receiver perceptions of truthfulness. While IDT agrees that highly motivated liars tend to pay more attention to receiver cues of suspicion than less motivated liars, resulting in higher levels of cognitive load, IDT argues that this increased attention allows the liar to appropriately modify their expressive behaviours and thus allay receiver suspicions. Importantly, according to IDT, the facilitative effect of sender motivation is not restricted to instances of deception, but also occurs during instances of truth telling. That is, IDT predicts that sender motivation is positively associated with receiver judgements of credibility and that the effect is independent of message veracity. To ascertain the evidence for each of these different theoretical perspectives and gain a better understanding of how sender motivation influences deceptive abilities, the following section presents a critical review of the relevant empirical research.

5 This limitation pertains to the literature on the MIE and not the literature on the SPP. While the two literatures are strongly related to one another, the MIE literature focuses on the perspective of a liar whereas the SPP literature consider both the perspective of a liar and that of a truth-teller. This is an important distinction as the predictions from the MIE literature do not necessarily coincide with those from the SPP literature. 6 The cognition of deception is discussed in more detail in Chapter 4. 15

Chapter 2: The influence of sender motivation ______

The Influence of Sender Motivation: The Empirical Evidence Krauss (1981) offered the first published report of an empirical investigation into the relationship between sender motivation and perceptions of credibility. Krauss and colleagues had participant-senders complete several recorded interviews where they were instructed to lie and tell the truth to various questions about their personal beliefs and plans for the future. In an attempt to manipulate sender motivation, half the participant-senders were told that the ability to deceive successfully was related to intellectual and creative ability, as well as future career success, and that their recorded interviews would later be evaluated by a team of psychiatrists (the high motivation condition). The other half of the participant-senders did not receive any special instructions regarding the interview (the low motivation condition). In addition to the two sets of instructions given to the participant-senders, the availability of their visual behaviours was also manipulated by having half of the interviews conducted over an intercom, with the participant-sender and interviewer situated in separate rooms (the intercom condition), while the other half of the interviews were conducted in person, with the participant-sender and interviewer situated in the same room (the face-to-face condition). After each question, the interviewer rated how truthful they thought the participant-sender had been on a seven-point scale. The author reported that accuracy (calculated by subtracting the sum of the ratings when the participant-senders were being deceptive from the sum of the ratings when the participant-senders were being truthful) did not depend on motivation condition, with both the high and low motivation conditions exhibiting approximately the same level of accuracy. To further investigate whether the motivation manipulation influenced perceptions of credibility and whether any effects depended on which behavioural channels receivers were exposed to, some of the recorded interviews were subsequently shown to an independent sample of naïve receivers where they were rated for veracity in one of three conditions; an audio only condition, a video only condition and an audio-visual condition. While the motivation manipulation did not seem to influence the ratings of truthfulness provided by the interviewers in the first phase of the study, Krauss (1981) reported that when the ratings were made by those watching the recordings (the naïve receivers), accuracy was greater in the high motivation condition than in the low motivation condition. Furthermore, while this effect was observed in each of the three

16

Chapter 2: The influence of sender motivation ______rating conditions, the effect was larger for interviews conducted over the intercom than for those conducted face-to-face. However, several methodological and statistical issues restrict the conclusions that can be drawn from these results. Importantly, Krauss (1981) noted that the ANOVA used to analyse the data was “large and complex” and that it “produced enough significant main and interaction effects to be both gratifying and distressing” (p. 330). This comment is particularly troubling as several specifics regarding the data analysis were not included in the original publication, thus a comprehensive critique of the results and the inferences drawn from them is not possible. One can, however, infer some of the specifics from the design of the experiment. Presumably, the ANOVA used to analyse the data included at least three factors; a motivation factor with two levels (low motivation vs. high motivation), an interview type factor with two levels (intercom vs. face-to-face), and a rating condition factor with three levels (audio vs. video vs. audio-visual)7. This specification produces three main effects, three two-way interactions, and a single three- way interaction, with sixty-six pairwise comparisons available for testing8. While not specifically stated in the original publication, Krauss (1981) appears to adopt a data driven exploratory approach to data analysis, rather than a theoretically guided confirmatory approach. With sixty-six possible pairwise comparisons available for testing, exploratory data analysis is problematic as it considerably raises the likelihood of a false discovery (Cohen, Cohen, West, & Aiken, 2003). That is, as the number of groups increases the likelihood that some pairs of means may appear to be different increases, even if all of the population means are equal (Agresti & Finlay, 1997). While a variety of corrective procedures are available to adjust conventional decision thresholds for the number of tests conducted, many of the specifics regarding the details of the tests were omitted from the original publication. This means it is difficult to ascertain whether any adjustments were applied to the results, thus there is no way of determining which findings may be considered robust and likely to replicate with new samples.

7 While the original publication stated that the analysis included several control variables from both the first and second phases of the study, the details regarding these variables were omitted. For the sake of clarity they are not considered here. 8 The number of pair-wise comparisons was calculated using the formula k(k - 1)/2, where k is the number of cell means (Agresti & Finlay, 1997). 17

Chapter 2: The influence of sender motivation ______

In addition to the problems associated with the uncorrected exploratory analyses, the motivation manipulation itself may have been problematic. As the motivation manipulation was employed between-senders, with the data aggregated by-receiver (each receiver’s accuracy score was calculated by subtracting the sum of their ratings when the messages were deceptive from the sum of their ratings when the message were truthful), it may be the case that the observed between-group differences reflect nothing more than sampling error associated with the participant-senders. It may have been the case that, simply by chance, the participant-senders allocated to the motivated condition may have been worse liars or more sincere truth-tellers than those allocated to the control condition (Watkins & Martire, 2015). As standard by-receiver analyses do not incorporate the sampling variability associated with senders, the inferences drawn from the results are restricted to the particular sample of participant-senders used in the study9. Interpreting the results reported by Krauss (1981) through the lens of the SPP, DePaulo, et al. (1983) argued that the senders in the high motivation condition may have been betrayed by their nonverbal behaviours. They hypothesized that the liars in the high motivation condition may have manifested more cues to deception in the nonverbal channel, resulting in lower truthfulness ratings when the receivers had access to the nonverbal channel. To test their predictions they had participant-senders (N = 32) answer four questions (two truthful and two deceptive) in front of a panel of six participant-receivers. The participant-senders were also given advanced knowledge regarding two of the questions and given two minutes to plan one truthful and one deceptive response. The six participant-receivers evaluated each message in terms of deceptiveness, planning, and tension on three 7-point scales. Similar to the Krauss (1981) study, DePaulo et al. (1983) manipulated sender motivation by providing participant-senders with one of two information sheets outlining several fictional research findings. Those in the high motivation group read that deceptive skill was associated with career success and that during the next phase of the experiment they would be “on trial” and that a panel of their peers would be carefully scrutinizing their behaviour and evaluating their truthfulness. They were also told that the interviews would be recorded and shown to another group of receivers where their behaviours could be studied even more carefully. Those in the low

9 This issue will be discussed in greater detail in Chapter 8. 18

Chapter 2: The influence of sender motivation ______motivation condition read several examples of innocuous lies and were told that the next phase of the experiment was a game where they had to lie and tell the truth while others tried to determine the veracity of their messages. Those in the low motivation condition were also told that while the interviews would be recorded there were no plans to use the recordings in the future. With regard to the ratings of deceptiveness, while the authors reported that the deceptive messages were rated as more deceptive than the truthful messages, the difference between the two mean deception ratings did not depend on motivation condition. The authors argued that because the participant-receivers had access to the full array of sender behaviours, any differences in nonverbal behaviours may have been counteracted by differences in verbal behaviours. That is, they argued that while the participant-senders in the high motivation condition may have displayed more nonverbal cues to deception than those in the low motivation condition, their verbal displays may have been more convincing, with the negative effect of the nonverbal cues being cancelled out by the positive effect of the verbal behaviours. As the authors hypothesized that motivated senders may be betrayed by their nonverbal behaviours, but not by their verbal behaviours, they argued that appropriate deceptiveness ratings must only be influenced by each sender’s nonverbal behaviours. To this end, the recorded interviews were later shown to an additional sample of naïve participant-receivers (N = 64) where they were rated on the same three scales of deceptiveness, planning, and tension in one of four rating conditions; a visual only condition, a verbal only condition, an audio only condition, and an audio-visual condition. Similar to the results from the first phase of the study, the results from the second phase indicated that while the deceptive messages were rated as more deceptive than the truthful messages, the difference between the two mean deception ratings did not depend on motivation condition. The authors did report, however, that the three-way interaction between message veracity, motivation condition, and rating condition was statistically significant. To investigate this interaction further the authors examined the differences in detectability scores. These scores were calculated by subtracting the ratings of deceptiveness for the truthful messages from the ratings of deceptiveness for the deceptive messages. The authors reported that the follow-up tests indicated that the lies in the high motivation condition were more detectable than the lies in the low motivation condition only when the receivers had access to nonverbal channels (in the

19

Chapter 2: The influence of sender motivation ______visual, audio, and audio-visual conditions). They also reported that the effect was reversed in the verbal only condition, with the lies in the high motivation condition less detectable than the lies in the low motivation condition. While the authors concluded that the results were consistent with their hypothesis, like the Krauss (1981) study, several limitations temper this conclusion. Primarily, the outcomes from the first phase of the study were analysed with a 2 x 2 x 2 x 2 x 2 ANOVA, indicating that there were 496 pairwise comparisons available for testing. This means that we would expect around 25 of these comparisons to be statistically significant at the .05 level even if all the population means were equal. Furthermore, the outcomes from the second phase of the study were analysed with a 2 x 2 x 2 x 2 x 2 x 4 ANOVA, indicating that there were 8,128 pairwise comparisons available for testing, with around 406 of these comparisons expected to be statistically significant if the null hypotheses were actually true. While an overall correction for the number of possible comparisons would imply that each comparison was of equal substantive importance, which is clearly not the case, like the Krauss (1981) study, the authors interpret all the significant findings rather than only those specified a priori. This suggests that the authors adopted an exploratory approach to data analysis, rather than a confirmatory approach, making it difficult to quantify the uncertainly associated with any one particular result. This problem is magnified by the fact that each analysis was conducted three times, each with a different dependant variable (ratings of deceptiveness, planning, and tenseness). Another limitation of the data analysis conducted by DePaulo et al. (1983) pertains to the authors’ conclusions regarding the differences in the detectability scores across the motivation conditions. Specifically, while the authors were correct in their interpretation of a single mean detectability score (a positive mean detectability score indicates that on average the deceptive messages were perceived as more deceptive than the truthful message), their interpretation of a mean difference in detectability scores may have been mistaken. Importantly, a positive mean difference between detectability scores does not necessarily indicate that on average the deceptive messages in the high motivation condition were perceived as more deceptive than the deceptive messages in the low motivation condition, as the authors propose; rather it may indicate that on average the truthful messages in the high motivation condition were perceived as less deceptive than the truthful messages in the low motivation condition.

20

Chapter 2: The influence of sender motivation ______

To clarify the problem with interpreting mean differences in detectability scores it is helpful to consider how different hypothetical mean deceptiveness ratings can produce the same mean differences in detectability scores. If the mean deceptiveness rating for the deceptive messages in the low motivation condition was five and the mean deceptiveness rating for the truthful messages in the low motivation condition was also five, then the mean detectability score for the low motivation condition would be zero. If the mean deceptiveness rating for the deceptive messages in the high motivation condition was six and the mean deceptiveness rating for the truthful messages in the high motivation condition was five, then the mean detectability score for the high motivation condition would be one. In this example the mean difference in detectability scores is one, with the difference driven by a change in the mean ratings for the deceptive messages. The same result, however, can also be observed when the mean ratings for the truthful messages change across the motivation conditions. For instance, if the mean deceptiveness rating for the deceptive messages in the low motivation condition was five and the mean deceptiveness rating for the truthful messages in the low motivation condition was also five, then the mean detectability score for the low motivation condition would be zero. If the mean deceptiveness rating for the deceptive messages in the high motivation condition was five and the mean deceptiveness rating for the truthful messages in the high motivation condition was four, then the mean detectability score for the high motivation condition would be one. Like the previous example, the mean difference in detectability scores is also one, but the difference is driven by a change in the mean ratings for the truthful messages rather than a change in the mean ratings for the deceptive messages. By omitting the message veracity factor and analysing the mean differences in detectability scores it is impossible to determine whether the effect of the motivation manipulation was attributable to a change in the deceptiveness ratings for the truthful messages, the deceptive messages, or both types of messages. To further explore the conditions under which the predicted motivational impairment effect occurs, DePaulo, Stone, and Lassiter (1985) conducted a study using several different operationalizations of motivation. They hypothesised that a liar’s motivation to succeed would be higher when the target of the deceptive message was of the opposite-sex, resulting in lies that would appear less sincere than those told to targets of the same-sex. They further hypothesised that a liar’s motivation to succeed

21

Chapter 2: The influence of sender motivation ______would be even higher when the opposite-sex target was attractive and when the message was ingratiating. To test their hypotheses, they had participant-senders (N = 64) express both truthful and deceptive opinions to a fictitious partner while being recorded. The recordings were later shown to a sample of participant-receivers (N = 271) where they were rated for sincerity on a 7-point scale. The messages were rated in one of four conditions; a verbal modality condition (words only, in transcript form), an audio modality condition, a visual modality condition, or an audio-visual modality condition. To manipulate the truthfulness of the messages (message veracity) and whether the messages were ingratiating or not (message type), each participant-sender was provided with 4 sets of written instructions; one for each opinion to be expressed. Each set of instructions contained 4 pieces of information; (1) the particular issue to be discussed, (2) the participant-sender’s original position on the issue (determined through a preliminary survey), (3) the partner’s position on the issue, and (4) the position that the participant-sender was to attempt to convey. Message type was manipulated by having the partner’s position on the issue made consistent with the position the participant-sender was meant to convey for two of the topics (ingratiating condition) and inconsistent for the other two (non-ingratiating condition). Message veracity was manipulated by having the position the participant-sender was meant to convey made consistent with their original position for two of the topics (truth condition) and inconsistent for the other two topics (lie condition). In addition to message type and message veracity, sender motivation was also manipulated by varying the sex and attractiveness of the fictitious partners. Along with the 4 sets of instructions, participant-senders were handed a photograph depicting their partner. The photograph was either of an attractive same-sex partner, an unattractive same-sex partner, an attractive opposite-sex partner, or an unattractive opposite-sex partner10. In total there were 8 photographs (2 per condition). The authors reported that the three-way interaction between message veracity, sender sex, and target attractiveness was marginally significant. Once again, they followed-up the interaction by analysing detectability scores, reporting that the lies told by female senders were rated as less sincere when the target was attractive compared to

10 The photographs were informally selected from a larger sample of photographs and rated for attractiveness by 108 undergraduates from a psychology course. An analysis of the attractiveness ratings revealed that the attractive photos were rated as significantly more attractive than the unattractive photos. 22

Chapter 2: The influence of sender motivation ______when the target was unattractive, whereas the lies told by male senders were equally detectable regardless of whether the target was attractive or unattractive. This effect was observed in all rating conditions. The authors also reported that when the sincerity ratings were averaged across the levels of message veracity and sender sex, messages told to attractive targets were rated as less sincere than the messages told to unattractive targets, with this difference being greater when the receivers had access to nonverbal behaviours (in the audio, visual, and audio-visual conditions). With regard to the effects of target sex, the authors reported that when the target was the same-sex as the sender, lies were indistinguishable from truths in all channel conditions, whereas when the target was of the opposite-sex, lies were rated as less sincere only when receivers had access to nonverbal behaviours11. Similarly, when the messages were non-ingratiating, the authors reported that lies were indistinguishable from truths in all channel conditions, whereas when the messages were ingratiating, lies were rated as less sincere than truths when receivers had access to visual cues (in the visual and audio-visual conditions). Furthermore, they reported that the ingratiating lies were rated as less sincere than the non-ingratiating lies when the target was attractive, but not when the target was unattractive. Like the DePaulo et al. (1983) study, the data from the DePaulo et al. (1985) study was submitted to several complex ANOVA models; namely a 2 x 2 x 2 x 2 x 2 x 4 ANOVA where the dependant variable was the by-sender aggregated sincerity ratings, a 2 x 2 x 2 x 2 x 4 ANOVA where the dependant variable was the detectability scores and a 2 x 2 x 2 x 4 ANOVA where the dependant variable was a mean sincerity score, averaged over the levels of sender sex and message veracity. Once again it appears as though the authors interpret all the significant findings without correcting for the multiplicity of tests. As previously discussed, this exploratory approach makes it difficult to determine the robustness of the reported findings. Furthermore, the use of detectability scores removes the ability to compare the absolute position of the cell means. That is, the reliance on detectability scores makes it impossible to determine whether the effect of the motivation manipulation was attributable to a change in the deceptiveness ratings for the truthful messages, the deceptive messages, or both types of messages. Like the previous study, it may be the case that the motivation manipulation

11 The authors also reported that the three-way interaction between sender sex, target sex and message veracity was significant, but no follow-up tests or interpretation was reported. 23

Chapter 2: The influence of sender motivation ______increased the perceived credibility of the truthful messages, rather than decreasing the perceived credibility of the deceptive messages. In addition to the statistical issues, target sex and target attractiveness were manipulated between-subjects using a restricted sample of photographs (2 photos per condition). This means that the between-group differences may not be due to differences in target sex and target attractiveness per say (inferred to cause differences in sender motivation); rather they may be due to other unmeasured features of the particular photographs. The restricted sample size does not allow the other features of the photographs to be randomly distributed within each condition. To more thoroughly examine one of the underlying assumptions regarding the MIE, namely that attempts to control nonverbal behaviours result in impaired deceptive performance, DePaulo, Kirkendol, Tang, and O’Brien (1988) recorded participant- senders (N = 131) while they expressed opinions to a fictitious partner while also attempting to control one of seven different behavioural channels (their facial expressions, tone of voice, etc.). Participant-receivers (N = 223) then rated the messages on a 9-point scale of sincerity in one of three conditions; a verbal modality condition, an audio modality condition or an audio-visual modality condition. They hypothesized that lies would be more detectable when the receivers had access to the same channel that the sender was trying to control. To manipulate which channel participant-senders attempted to control during the opinion task, they were told that they would have the opportunity to make a good impression by using only certain cues. Those in the face condition were told that their partner was behind a one-way mirror and they could only see their face and could not hear anything they said. Those in the body condition received the same instruction, except they were told that their partner could only see them from the neck down. Those in the visual condition were told that their partner could see their facial expressions and body movements but could not hear what they said. Those in the tone of voice condition were told that their partner would only be able to hear their tone of voice and would not be able to hear their words or see their facial expressions or body movements. Those in the verbal condition were told that their partner would only be able to read what the subject said and would not be able to see or hear them. Those in the audio condition were told that their partner could only hear what they said but could not see them. Those

24

Chapter 2: The influence of sender motivation ______in the audio-visual condition were told that their partner could see and hear their faces and body movements and could hear their words and tone of voice. In addition to manipulating the channel that the participant-senders were attempting to control, DePaulo et al. (1988) also attempted to further replicate the MIE with a less inclusive motivational manipulation than that used in DePaulo et al. (1983). Like the earlier study, sender motivation was manipulated by providing participant- senders with one of two different instruction sheets outlining the results of some fictitious research in which deceptive ability was described as either relevant (high motivation condition) or irrelevant (low motivation condition) to personal and professional success. Specifically, those in the low motivation condition were told that the ability to convey certain impressions was not related to other skills such as making and keeping friends or career success, whereas those in the high motivation condition were told the opposite - that the ability to convey certain impressions is related to skills such as making and keeping friends and career success. Those in the high motivation condition were also told that people who are skilled at impression management were also extremely intelligent in ways that are not measured by conventional IQ tests. Unlike the earlier study, however, no specifics regarding the recordings were provided to participants. Message veracity and message type were manipulated using the same method used in DePaulo et al. (1985); via the 4 sets of written instructions. Again, each set of instructions contained 4 pieces of information; (1) the particular issue to be discussed, (2) the participant-sender’s original position on the issue (determined through a preliminary survey), (3) the partner’s position on the issue and (4) the position that the participant-sender was to attempt to convey. Message type was manipulated by having the partner’s position on the issue made consistent with the position the participant- sender was meant to convey for two of the topics (ingratiating condition) and inconsistent for the other two (non-ingratiating condition). Message veracity was manipulated by having the position the participant-sender was meant to convey made consistent with their original position for two of the topics (truth condition) and inconsistent for the other two topics (lie condition). The authors reported that while the overall pattern of sender detectability scores was in line with their predictions, with higher detectability scores observed in conditions where the receivers had access to the same channel the sender was trying to

25

Chapter 2: The influence of sender motivation ______control, the overall test was not statistically significant. To further investigate their predictions the authors conducted follow-up exploratory analyses within each of the different attempted control conditions. The authors reported that for senders attempting to control all of their verbal and nonverbal behaviours simultaneously (the audio-visual condition), detectability scores were higher when the receivers had access to nonverbal behaviours than when they only had access to verbal behaviours. None of the other follow-up analyses were statistically significant. With regard to the different motivation manipulations, the authors conducted analyses on detectability scores (analysis 1), reporting that the three-way interaction between motivation condition, message type and rating condition was statistically significant. They concluded that the ingratiating lies were more detectable when the receivers had access to nonverbal behaviours than when they only had access to verbal behaviours, and that this effect was only observed within the high motivation condition. To investigate whether the observed differences between rating conditions was greater for ingratiating messages compared to non-ingratiating messages, the authors calculated a new dependant variable by subtracting the detectability scores for the non-ingratiating messages from the detectability scores for the ingratiating messages (analysis 2), arguing that higher scores indicated that the ingratiating lies were more detectable than the non-ingratiating lies. They reported that within the high motivation condition, the ingratiating lies were more detectable than the non-ingratiating lies only when the receivers had access to nonverbal behaviours. It is important to point out that no formal follow-up tests for the effects discussed above were actually reported in DePaulo et al. (1988). For analysis 1 (where the detectability scores were used as the dependant variable), the article simply states that the interaction was significant, provides the omnibus test results, and draws attention to the pattern of mean detectability scores. No formal follow-up tests investigating the interaction were reported. For analysis 2 (where the difference between the detectability scores was used as the dependant variable), the article simply states that the scores were in accord with predictions. No omnibus tests were reported, nor were the results of any follow-up tests. More information would be required to appropriately evaluate the support for the authors’ inferences. Specifically, formal follow-up tests would be required to quantify the uncertainty associated with the population-level estimates. While it may have been the case that the particular sample of ingratiating lies

26

Chapter 2: The influence of sender motivation ______were more detectable when the receivers had access to nonverbal behaviours than when they only had access to their verbal behaviours, and that this difference only occurred in the high motivation condition (analysis 1), this does not necessarily mean that this difference exists in the population, as the authors propose. If the experiment was repeated, the results might show that the second sample of ingratiating lies were less detectable when the receivers had access to nonverbal behaviours than when they only had access to their verbal behaviours, and that this difference occurred in the high motivation and low motivation conditions. While both results are technically possible, they are not equally probable if one makes certain assumptions regarding the state of the population. The role of inferential statistics is to estimate the likelihood of a given result under competing population-level assumptions. As this information was not reported it is impossible to determine how much the evidence supports the authors’ hypothesis relative to the null hypothesis. Even if the results of the follow-up tests were reported, any differences would still be difficult to interpret. As previously discussed, the results from analyses conducted on detectability scores (analysis 1) are difficult to interpret because the analyses do not assess whether higher order effects (interactions) are driven by changes in the mean ratings for the truthful messages, the deceptive messages, or both types of messages. Conducting analyses on the differences in detectability scores (analysis 2) compounds the problem. In analysis 2, the dependant variable was calculated by the subtracting the detectability scores for the non-ingratiating messages from the detectability scores for the ingratiating messages. This procedure not only removes the absolute position of the cell means for the truthful and deceptive messages, but also removes the absolute position of the mean detectability scores themselves. This means that there are many different patterns of cell means that could give rise to the final scores, with the authors’ interpretation of the final scores only one of many different plausible interpretations. While DePaulo et al. (1988) argued that their results were consistent with the predicted MIE; they hypothesized that certain senders may be more or less susceptible to the MIE. Specifically, they argued that senders who are especially skilled or confident in the expressive domain may be less susceptible, such as those who are physically attractive. They argued that physically attractive people tend to have more self-confidence, are more accustomed to being the objects of scrutiny and may have had

27

Chapter 2: The influence of sender motivation ______a more supportive environment where they were able to practice and develop their communicative skills compared to those who are unattractive. They therefore predicted that as sender attractiveness increases the MIE would decrease. To ascertain the physical attractiveness of each participant-sender, a separate sample of participant-receivers watched the recordings and rated the physical attractiveness of each sender on a 9-point scale. These ratings were used to split the senders into three groups; low, moderate, and high attractiveness. The authors reported that the four-way interaction between motivation condition, message veracity, rating condition, and sender attractiveness was statistically significant. To investigate the interaction they conducted follow-up analyses on the differences between detectability scores, reporting that when the receivers had access to the nonverbal behaviours, the ingratiating lies produced under conditions of high motivation were more detectable than the non-ingratiating lies produced under conditions of high motivation, with the effect being greatest among the unattractive senders, less evident among the moderately attractive senders and least evident among the highly attractive senders. As previously discussed, however, the results from analyses where the dependant variable is the difference between detectability scores are difficult to interpret. This is because many different combinations of cell means can give rise to the same final score, meaning that the authors’ interpretation is only one of many different plausible interpretations. DePaulo, LeMay, and Epstein (1991) argued that the MIE observed in two of their previous studies (DePaulo et al., 1985; DePaulo et al. 1988) may have been moderated by senders’ expectations for success, with the MIE being more pronounced when expectations for success were low. In particular, they argued that while the senders in the DePaulo et al. (1985) study would have been more motivated to impress an attractive partner than an unattractive partner, they would have also been more insecure about their ability to do so. They argued that the MIE would be more evident under conditions of high motivation but low expectations for success. They also argued that the supposed relationship between expectations for success and the MIE could account for some of the findings reported in DePaulo et al. (1988), with unattractive speakers being more susceptible to the MIE because they likely had less confidence (low expectations for success) in social situations. To investigate whether expectations for success moderated the MIE, DePaulo et al. (1991) had women (N = 60) who had previously described themselves as very

28

Chapter 2: The influence of sender motivation ______independent tell two stories (one true and one false) which highlighted their independence (identity-affirming messages) and two stories which highlighted their dependence (identity-repudiating messages). These stories were told to an interviewer who was either described as very trusting (high expectations for success condition) or very wary (low expectations of success condition). Furthermore, they either expected to be evaluated by the interviewer with regard to their sincerity (high importance of success) or not be evaluated by the interviewer (low importance of success condition). The recordings were then rated for sincerity on a 9-point scale by an independent sample of participant-receivers (N = 816) in one of three rating conditions; a verbal only condition, an audio only condition, and an audio-visual condition. They hypothesized that when people who see themselves as independent, and claim that it is very important to be independent, tell lies that highlight their independence (identity-affirming messages), they will appear less sincere than when they tell lies that highlight their dependence (identity-repudiating messages) because they would be more motivated to be believed when telling identity-affirming lies than identity-repudiating lies. They further hypothesised that this MIE would only occur when expectations for success were low. Sender expectations for success were manipulated by having senders read one of two information sheets that provided information about the interviewer. Both information sheets included some hand written notes describing the interviewer’s previous experience and some reasons for wanting to be a clinical psychologist. The bottom of the page contained the interviewer’s results from a test of nonverbal sensitivity and a trust versus wariness test with an accompanying interpretation. In the high expectations for success condition, the scores indicated that the interviewer scored in the 51st percentile on the test of nonverbal sensitivity and in the 94th percentile in the trusting direction of the truth versus wariness test, with the accompanying interpretation stating that the interviewer was very open and trusting, took things at face value and was only average at reading nonverbal cues. In the low expectations for success condition, the scores indicated that the interviewer scored in the 91st percentile on the test of nonverbal sensitivity and in the 94th percentile in the wary direction of the truth versus wariness test, with the accompanying interpretation stating that the interviewer was very cautious and wary, did not always take things at face values and was very highly skilled at reading nonverbal cues.

29

Chapter 2: The influence of sender motivation ______

Importance of success was manipulated by having senders read one of two additional information sheets describing the interviewer’s task. In the high importance condition, the interviewer was presented as an evaluator. Senders were told that each time they finished a message they would be required to answer a few questions while the interviewer writes down their impressions of the sender’s honesty and sincerity. They were also told that after they had provided all four messages the interviewer would convey to them their overall impression of the senders honesty and sincerity. In the low importance condition the interviewer was not presented as an evaluator. Senders were told that each time they finished a message they would be required to answer a few questions while the interviewer prepares for the next question. They were also told that after they had provided all four messages the experiment would be over. The authors reported the results of a 2 x 2 x 2 x 2 x 3 ANOVA where the dependant variable was the ratings of sincerity (aggregated by-sender). They reported that the four-way interaction between expectations of success, importance of success, identity relevance and message veracity was statistically significant. The authors followed up this interaction by examining differences in detectability scores. They reported that the identity-affirming lies told under conditions of high importance and low expectations for success were the most readily detectable lies, with identity- repudiating lies told under the same conditions being not at all detectable. They also reported that high expectations for success seemed to inoculate senders from the MIE, with identity-affirming lies told under conditions of high importance being less detectable when expectations for success were high compared to when they were low. In addition to the follow-up tests conducted on detectability scores, the authors also reported the results of two contrast tests investigating whether the three-way interaction between expectations for success, importance of success, and identity relevance was significant at each level of message veracity. That is, they sought to investigate whether the observed mean differences in detectability scores were caused by changes in the sincerity ratings for the truthful messages, the deceptive messages, or both. While the authors reported that the contrast testing the interaction among the truthful messages was nearly significant, the contrast testing the interaction among the deceptive messages was not. While the authors acknowledge that the identity relevance manipulation only had a significant impact on the perceived sincerity of the truthful messages, they argued

30

Chapter 2: The influence of sender motivation ______that this effect can still be considered as a MIE because the senders failed to make their identify-affirming lies seem just as sincere as their identify-affirming truths when their motivation to succeed was high. Specifically, they argued that “any discrepancy that develops between truths and lies is indicative of a failure of expressive control” (p. 21). This interpretation seems misleading as the motivation manipulation did not cause perceptions of credibility to decrease in any of the conditions. Indeed, it could be argued that those in the high motivation condition were more successful at regulating their expressive behaviours when they were telling the truth, appearing more truthful than those who did not receive the motivational manipulation. Furthermore, the simple effect of motivation among the deceptive messages was not statistically significant, with the mean ratings suggesting that the deceptive messages produced in the high motivation condition were perceived just as sincere as those produced in the control condition. Given these findings, it seems inappropriate to describe the effect of motivation as ‘impairing’. While the experiments conducted by DePaulo and colleagues were motivated by a MIE hypothesis, Burgoon and Floyd (2000) argued that the theory suffers important conceptual shortcomings and that the purposeful regulation of expressive behaviours may actually increase the success of deception attempts. Drawing on IDT, Burgoon and Floyd (2000) made four specific predictions regarding the effects of sender motivation. They hypothesised that (1) motivation would be positively related to verbal and nonverbal performance, regardless of message veracity, (2) as motivation increases, receiver detection accuracy decreases12, (3) relative to interactions between strangers, senders interacting with friends or acquaintances perform better verbally and nonverbally, regardless of message veracity, and (4) receivers interacting with deceptive friends have higher truth biases, more favourable judgements of sender honesty and trustworthiness, and worse detection accuracy than receivers interacting with strangers. To test their predictions, they had participant-senders (N = 64) complete four conversations (two truthful and two deceptive) with either a same-sex friend they had brought with them to the experiment or with another participant with whom they were not acquainted. Before the conversations began, each sender completed 14 rating scales

12 While their original prediction was that “as motivation increases, receiver detection accuracy decreases” (p . 249), a more accurate prediction based on their rationale would be that, as motivation increases, detection accuracy for lies decreases while detection accuracy for truths increases. 31

Chapter 2: The influence of sender motivation ______designed to measure several aspects of their motivation. The scores from the rating scales were combined into 3 composite measures of motivation reflecting the desire to deceive successfully, the desire to manage the relationship and the desire to manage arousal. Following the conversations, each sender also completed 3 additional ratings scales with the scores combined into 1 composite measure reflecting the degree to which they monitored their performance and were attentive to their conversational partner’s feedback (monitoring). After each participant-sender and partner pair completed the four conversations they were each seated in separate rooms where they watched two of the recorded conversations (one truthful and one deceptive). Once they finished watching each of the recorded conversations the senders were asked to rate how truthful they were in their responses to each of the target topics, whereas partners were asked to rate how truthful they thought the sender was in their response to each of the target topics. These ratings were used to calculate accuracy and truth bias scores, with the absolute difference between the sender and partner ratings taken as a measure of accuracy. Truth bias scores were calculated by subtracting sender ratings from partner ratings. In addition to rating how truthful they thought the sender was in their response to each of the target topics, partners also completed a further 52 rating scales designed to assess how they perceived the senders verbal and nonverbal performances. Sixteen of the rating scales were designed to assess senders perceived verbal performance, with the resultant scores combined into 2 composite measures reflecting how truthful the responses were perceived to be and how attributable to the sender they were (truthful/personal) and how detailed, non-ambiguous and relevant the responses were deemed to be (completeness/clarity/directness). The remaining 36 rating scales were designed to assess senders perceived nonverbal performance, with the resultant scores combined into a further 6 composite measures reflecting how involved the sender was perceived to be (involvement/dominance), the extent to which the sender behaved in a typical and appropriate manner (expectedness), how pleasant the interaction was (pleasantness), how skilfully the sender managed the conversation (non-impairment), the overall impression the sender conveyed (good impression) and the perceived trustworthiness, believability and general rapport of the sender (trust). In addition to the partner ratings of nonverbal performance, each sender’s nonverbal performance was

32

Chapter 2: The influence of sender motivation ______also rated by three trained coders. The coders rated the sender’s nonverbal performance on scales reflecting dominance, involvement and pleasantness. To test their first hypothesis, the authors examined the relationships between the 4 composite measures of motivation provided by the senders and the 6 composite measures of the sender’s verbal and nonverbal performance provided by the partners. The relationships between the 4 measures of motivation provided by the senders and the 3 measures of the sender’s verbal and nonverbal performance provided by the coders were also examined. As the authors were interested in whether the relationships varied as a function of message veracity (truth vs. lie) and/or as a function of veracity order (truth first vs. truth second), three correlational matrices were examined for each set of relationships (3 using the partner ratings and 3 using the coder ratings). The first correlation matrix contained 160 correlation coefficients estimating the relationships within each cell of the two crossed factors (truth within truth first, lie within truth first, truth within truth second and lie within truth second). The second correlation matrix contained 80 correlation coefficients estimating the relationships at each level of message veracity (averaging over the levels of veracity order) while the third contained another 80 correlation coefficients estimating the relationships at each level of veracity order (averaging over the levels of message veracity). Of the 320 correlations coefficients examined, 33 (10.31%) were reported as statistically significant at the .05 level. While the authors concluded that the majority of the significant results were consistent with their predictions, the number of statistically significant results does not exceed that expected under the null hypothesis. As the authors computed significance levels using 1-tailed tests, one would expect around 10% of the p values to be less than .05 when no actual relationships exist in the population. In other words, the number of observed significant relationships would likely occur simply due to random sampling error. If an appropriate adjustment was applied to account for the multiplicity of tests it is likely that none of the reported relationships would maintain their statistically significant status. To test their second hypothesis, the authors examined the relationships between the 4 composite measures of motivation provided by the participant-senders and the composite rating of trust provided by the partners. The relationships between the 4 composite measures of motivation provided by the senders and the measures of

33

Chapter 2: The influence of sender motivation ______accuracy and truth bias were also examined. Three correlational matrices were examined13. Of the 96 correlation coefficients examined, 19 (19.79%) were reported as statistically significant at the .05 level (around 10 more than expected under the null hypothesis). While the authors acknowledged that not all the significant relationships were entirely consistent with their hypothesis, they concluded that the overall pattern of results largely supported their predictions. Several important limitations, however, constrain this conclusion. A particular limitation of the correlational analysis reported by the authors is that the entire sample was analysed with no regard for potential between-group differences (friend vs. stranger). By analysing the entire sample, between-group differences in sender motivation scores and partner ratings of trust and truthfulness may falsely manifest as positive correlations. Importantly, by using relational status to delineate two groups that naturally differ in their inherent level of motivation the authors introduce a fundamental bias. Namely, friends presumably have more background knowledge of their conversational partner than strangers. For instance, during one of the conversations senders were asked to tell their partners about the most significant person in their life. While a stranger may have no background knowledge regarding the sender’s actual true answer to this question, friends may be all too aware of who is the most significant person in the sender’s life, greatly increasing their ability to tell when the sender is being truthful and when they are being deceptive. This fundamental bias confounds individual differences in sender motivation with individual differences in knowledge for the truth. That is, when relational status is ignored, it is not clear whether positive correlations are caused by individual differences in sender motivation or whether they are caused by inherent between-group differences with regard to the partner’s knowledge of the truth. To disentangle these competing explanations the relationships should be estimated within each level of relational status. To test their third hypothesis, the authors conducted 42 t-tests comparing the ratings of verbal and nonverbal performance provided by friends to the ratings provided

13 The first correlation matrix contained 48 correlation coefficients estimating the relationships within each cell of the two crossed factors, the second contained 24 correlation coefficients estimating the relationships at each level of message veracity (averaging over the levels of veracity order), while the third contained another 24 correlation coefficients estimating the relationships at each level of veracity order (averaging over the levels of message veracity). 34

Chapter 2: The influence of sender motivation ______by strangers. They also conducted a further 18 t-tests where the ratings of verbal and nonverbal performance were provided by the trained coders. All comparisons were first made within each level of message veracity and then across the levels of message veracity. When the ratings were provided by the partners, 3 tests were reported as statistically significant (1 more than expected under the null hypothesis). Again, however, these ratings confound the effects of sender motivation with the effects of relational partner (friends may provide higher ratings of pleasantness simply due to the fact that they are ratings a friend). When the ratings were provided by the coders, 2 tests were reported as statistically significant (1 more than expected under the null hypothesis). While these ratings remove the inherent biases associated with the partner ratings, the 2 significant results both pertained to the coder ratings of pleasantness, with the first significant result observed among the deceptive messages (senders interacting with friends received higher ratings than those interacting with strangers) and the second result observed across the levels of message veracity. As the subset of data used for the first test is nested within the data used for the second test, it is unsurprising that the second test was significant. While the authors concluded that the second test indicated that the friends were perceived as more pleasant regardless of message veracity, this is misleading as there was no effect of relational status observed among the truthful messages. To test their fourth hypothesis, the authors conducted 18 t-tests examining whether the friend group differed from the stranger group with regard to the measures of trust, truth bias, and accuracy. While 4 of these tests were reported as statistically significant (around 3 more than expected under the null hypothesis), once again the ratings confound motivational effects with relational effects. When a between-groups analysis is conducted, a positive mean difference may reflect nothing more than the fact that those with greater knowledge for the truth (the friends) are better able to assess adherence/deviations from the truth than those with less knowledge for the truth (the strangers).

35

Chapter 2: The influence of sender motivation ______

Summary The majority of the empirical research investigating the influence of sender motivation has sought to test a MIE hypothesis (DePaulo et al., 1983; DePaulo et al., 1985; DePaulo et al., 1988; DePaulo et al., 1991). While the data from these experiments were used to test a large number of hypotheses, with no apparent correction for the multiplicity of tests, taken together the results show that the senders who received the experimental manipulations (assumed to increase levels of sender motivation) typically produced truthful and deceptive messages that were more discriminable than those produced by the senders who did not receive the experimental manipulations. That is, those who received the experimental manipulations tended to have larger detectability scores than those who did not. While at first glance the experimental data may appear consistent with a MIE hypothesis, the relationship between sender motivation and perceptions of credibility may be more complex than originally proposed. Importantly, most of the reported results were obtained from analyses where the dependant variable consisted of detectability scores. As discussed earlier, the problem with these analyses is that any distinct difference between two detectability scores (a difference of one unit, for example) can be caused by many different combinations of credibility ratings (an increase of one unit across truthful conditions (low motivation vs. high motivation) or a decrease in one unit across deceptive deceptions). This is an important limitation because the MIE literature makes specific predictions regarding the interaction between sender motivation and message veracity. Specifically, the MIE literature predicts that high levels of sender motivation cause liars to experience higher levels of emotional arousal, cognitive load, and behavioural control than less motivated liars and that these differences increase the manifestation of nonverbal cues to deception. These nonverbal cues are thought to signal to receivers when the sender is lying, resulting in decreased perceptions of credibility for deceptive messages only. These predictions refer to a simple negative effect within the deceptive conditions of the reported experiments. As the reported differences between the mean detectability scores for the low and high motivation conditions may have been caused by changes in the credibility ratings for the truthful messages, the deceptive messages, or both types of messages, the specific predictions of the MIE literature cannot be evaluated.

36

Chapter 2: The influence of sender motivation ______

While Burgoon and Floyd (2000) asserted that the MIE literature predictions are specific to instances of deception, such predictions do not necessarily follow from the literature on the SPP14. Depending on the situation, truth-tellers may also regulate their expressive behaviours in an attempt to appear credible. They may also experience arousal if they perceive the consequences of being believed or not believed as meaningful. Indeed, the predictions made by DePaulo and colleagues rest on the assumption that their experimental manipulations influenced the motivation to succeed with only the deceptive messages, but not with the truthful ones. This seems unlikely, however, as the manipulations were applied across both conditions. Under these circumstances it seems that the SPP would predict that high levels of sender motivation would impair both deceptive and truthful performances (a main effect of sender motivation), rather than only deceptive performances (an interaction between sender motivation and message veracity). While this prediction has never been stated by the proponents of the SPP (to the best of my knowledge), it seems to be a natural extension of the theory. Unfortunately, while the study conducted by Burgoon and Floyd (2000) sought to test a main effect of sender motivation (even though their predictions were in the opposite direction), their study suffered from several methodological limitations and produced no more significant findings than would be expected under the null hypothesis, thus their results do not lend support to either theoretical account. While it remains to be seen whether the effect of sender motivation applies to instances of truth-telling and/or deception, the evidence reviewed in this chapter does demonstrate that the effect of sender motivation is different depending on whether the message is truthful or deceptive. That is, there does appear to be an interaction between sender motivation and message veracity. As previously discussed, this interaction suggests that the demands associated with one of the message types are more strongly influenced by changes in sender motivation than are the demands associated with the other type of message. In the context of the current thesis this is important because it suggests that the manipulation of sender motivation may provide a mechanism whereby the cognitive costs associated with impression management may be separated from those associated with message production.

14 It is again important to point out that we are treating the MIE and SPP literature separately here, although we acknowledge that the two are interrelated. This is because the predictions outlined in the MIE literature do not necessarily follow from those made from the SPP. 37

Chapter 2: The influence of sender motivation ______

From a cognitive perspective, the demands associated with self-presentation can be divided into two broad groups; (1) those associated with impression management and (2) those associated with message production. While both truth-tellers and liars are capable of engaging in impression management, the processes associated with message production tend to be specific to the type of message. Importantly, this means that while the relationship between the amount of impression management senders engage in and their level of motivation may be independent of message veracity, the relationship between the cognitive costs of message production and sender motivation may depend on whether the message is truthful or deceptive. This idea can be clarified by considering how the processes associated with each of these two broad groups might change with changes in sender motivation. The relationship between the amount of impression management senders engage in and their level of motivation may be independent of message veracity because senders that are equally motivated to be believed may engage in the same amount of impression management, irrespective of whether they are telling the truth or lying. For example, when neither truth-tellers nor liars are particularly motivated to be believed, both groups may engage in low levels of impression management. That is, neither group may try particularly hard to control their expressive behaviours and/or monitor the receiver’s behaviours for cues of suspicion, for example. When truth-tellers and liars are both highly motivated to be believed, however, both groups may engage in high levels of impression management. This is not to say that truth-tellers and liars necessarily engage in the same amount of impression management (liars are typically thought to be more motivated than truth-tellers), only that the relationship between the amount of impression management the sender engages in and their level of motivation is not necessarily related to whether the message is truthful or deceptive. In contrast to the relationship between the amount of impression management senders engage in and their level of motivation, the relationship between the cognitive costs of message production and sender motivation may depend on whether the message is truthful or deceptive. When both truth-tellers and liars are highly motivated to be believed, both groups may put more effort into telling their stories. According to Sporer & Schwandt (2006; 2007), when the message is deceptive the details of the story must be freely invented and/or pieced together from script knowledge of comparable

38

Chapter 2: The influence of sender motivation ______situations and events15. In this instance the more elaborate and detailed the story is the greater the cognitive costs associated with message production will be. When the message is truthful, however, the details of the story already exist and tend to have a natural structure. This means that truth-tellers, in contrast to liars, must simply recall and reconstruct the relevant episodic or autobiographical information from memory. The cognitive costs associated with this process are unlikely to be affected by the level of sender motivation. Importantly, if sender motivation influences levels of impression management in both truth-tellers and liars, but only influences the cognitive costs associated with deceptive message production, then different types of truthful conditions (low motivation vs. high motivation) can be used to tease apart the cognitive costs associated with each factor. That is, two sets of detectability scores can be calculated and used to isolate the effect of impression management. Specifically, if the detectability scores are calculated from a low motivation truthful condition (where the cognitive costs of impression management and message production are both low) and a high motivation deceptive condition (where the cognitive costs of impression management and message production are both high), then the detectability scores will include both the effect of message production and the effect of impression management (both of these factors differ across the two conditions used to calculate the detectability scores). On the other hand, if the detectability scores are calculated from a high motivation truthful condition (where the cognitive costs of impression management are high but the costs of message production are low) and a high motivation deceptive condition (where the cognitive costs of impression management and message production are both high), then the detectability scores will include only the effect of message production and not the effect of impression management (the conditions only differ with regard to the demands of message production). By varying the level of sender motivation among truthful conditions, the effect of impression management can be removed from the detectability scores. This is important for research that attempts to predict the variability in detectability scores as

15 The cognitive processes associated with deceptive message production are thought to depend on many factors, including what the message is about (opinions/attitudes, intentions, factual events, etc.), whether it has been rehearsed, and how good the senders knowledge of the truth is. The different theoretical models regarding the cognition of deception will be discussed in greater detail in Chapter 4. 39

Chapter 2: The influence of sender motivation ______the magnitude of any associations will depend on which set of detectability scores are used in analyses (those calculated from low or high motivation truthful conditions). In the context of predicting the cognitive costs of deception, if the magnitude of the relationship between cognitive ability measures and detectability scores is smaller when the effect of impression management has been removed from the detectability scores (when the detectability scores are calculated from the high motivation truthful condition) than when the effect of impression management is included in the detectability scores (when the detectability scores are calculated from the low motivation truthful condition), then the reduction in the effect size reflects the cognitive costs of impression management. If the effect size does not vary across the different sets of detectability scores, then we can infer that the cognitive costs associated with impression management are not sufficient to influence one of the credibility ratings more so than the other. The logic of this idea is presented in Table 1. While this may be an idealized account of how the demands of message production and impression management relate to levels of sender motivation, simple subjective manipulation checks can be presented to participants after each condition to assess whether, and to what extent, this theoretical account holds true and used to help interpret any differences in effect sizes. Unfortunately, none of the previous research investigating the effect of sender motivation included these types of manipulations checks, thus the degree to which levels of emotional arousal, cognitive load, and behavioural control vary with changes in sender motivation remains speculative. The current thesis aims to apply this approach to the prediction of detectability scores and collect self-report data to help tease apart the cognitive costs associated with message production from those associated with impression management16. In sum, the extant research investigating whether sender motivation influences perceptions of credibility suffers from several fairly serious limitations. Motivational manipulations appear to cause an increase in the discriminability of truthful and deceptive messages, but there is no empirical evidence regarding the mechanism of action (whether the difference is driven by a change in truthful performances or deceptive performances and whether senders can identify how the demands of the conditions change with changes in sender motivation). In the context of the current thesis, sender motivation will be used to attempt to tease apart the cognitive costs

16 The complete aims of the current thesis are presented at the end of Chapter 4. 40

Chapter 2: The influence of sender motivation ______associated with message production from those associated with impression management. The next chapter presents a review of the literature regarding two more individual difference factors that are thought to influence perceptions of credibility; namely personality traits and social skills.

41

Table 2 Idealized Comparison of Effect Sizes to Infer Partial Correlations Differences Between Condition Differences Between Conditions Differences Between Conditions Low High High Motivation Motivation Motivation C - A C - B (C – B) – (C – A) Truth (A) Truth (B) Lie (C) Differential Differential Effect Demand r R r R Rd Conclusion Demand Demand

Impression Some of the effect of Low High High Yes + No 0 Management impression management ++ + - was related to the ability Message Low Low High Yes + Yes + to distribute cognitive Production load ------

Impression None of the effect of Low High High Yes 0 No 0 Management impression management + + 0 was related to the ability Message Low Low High Yes + Yes + to distribute cognitive Production load

Note: r represents the partial correlation between the ability to distribute cognitive load and the relevant effect (unobserved). + indicates that there was a positive relationship while 0 indicates that there was no relationship. R represents the correlation between the ability to distribute cognitive load and the detectability scores (observed). + indicates that there was a positive relationship while ++ indicates that there was a strong positive relationship. Rd represents the difference between the two observed relationships. – indicates that the observed relationship was weaker when the high motivation truth was used to calculate detectability scores (C – B) than when the low motivation truth was used to calculate detectability scores (C – A) while 0 indicates that there was no difference between the two observed relationships.

Chapter 3: The influence of personality traits and social skills ______

Chapter 3 The Influence of Personality Traits and Social Skills Previous research has investigated whether sender personality traits and/or social skills are related to the ability to deceive. It is important to review this research because the theoretical mechanism as to why personality traits and/or social skills may be related to the ability to deceive concerns the differential experiencing/regulation of emotional arousal, cognitive load, and/or behavioural control. For example, people with high levels of Machiavellianism are thought to adopt whatever strategy best serves their goals (Christie & Geis, 1970), leading them to lie more often (Kashy & DePaulo, 1996), experience less guilt during deception (Gozna, Vrij, & Bull, 2001), and to have greater confidence in their ability to deceive than people with low levels of Machiavellianism (Giammarco, Atkinson, Baughman, Veselka, & Vernon, 2013). As a result, people with high levels of Machiavellianism are thought to display fewer behavioural signs of deception and thus be better liars than those with low levels of Machiavellianism. While research investigating the relationship between Machiavellianism and behavioural displays has produced mixed results (see Exline, Thibaut, Hickey, & Gumpert, 1970; Knapp, Hart, & Dennis, 1974; O’Hair, Cody, & Mclaughlin, 1981), it is important to review the broader literature on the effect of personality traits and social skills as it may provide insight into how the experiencing of emotional arousal, cognitive load, and/or behavioural control influences the ability to deceive. While Machiavellianism was one of the first personality constructs to be examined with regard to the behavioural correlates of deception, previous research has also investigated other personality-related factors such as emotional intelligence, self- monitoring, psychopathic traits, public self-consciousness, extraversion/introversion, dominance, and the ability to act (Cody & O’Hair, 1983; Miller, deTurck, & Kalbfleisch, 1983; Porter, ten Brinke, Bajer, & Wallace, 2011; Riggio & Friedman, 1983; Riggio, Tucker, & Widaman, 1987; Siegman & Reynolds, 1983; Vrij, Edward, & Bull, 2001). These studies, however, simply focus on the relationships between certain sender characteristics and behavioural displays. They do not directly investigate whether these sender characteristics influence receiver judgements of credibility. Importantly, there is a well-documented disconnect between the behavioural displays that actually distinguish truth from lie and those that receivers believe distinguish truth from lie (see Vrij, 2008). This means that the fact that there is a relationship between 43

Chapter 3: The influence of personality traits and social skills ______certain sender characteristics and behavioural displays does not necessarily mean that the characteristic will be related to receiver judgements. Ultimately, it is the receiver’s beliefs about behavioural displays that determine whether and to what extent sender characteristics are related to the ability to deceive. For instance, while people with high levels of Machiavellianism have been shown to maintain more eye contact during deception than people with low levels of Machiavellianism (Exline, Thibaut, Hickey & Gumpert, 1970), if receivers are insensitive to the differences in eye contact, then Machiavellianism will be unrelated to the ability to deceive. On the other hand, if receivers are sensitive to the differences in eye contact and tend to associate more eye contact with truthfulness, then Machiavellianism will be positively related to the ability to deceive. Another limitation of the studies that focus on the relationships between certain sender characteristics and behavioural displays is that they tend to examine these relationships in isolation. It may be the case that these relationships interact with each other, with receivers using a complex combination of behavioural displays to inform credibility judgements. To directly examine whether certain sender characteristics are related to the ability to deceive, it is necessary to actually obtain receiver judgements of credibility. Therefore, this chapter will review the select studies where estimates of credibility have actually been obtained by receiver judgements of credibility.

Individual Differences in Personality Traits and Social Skills: The Empirical Evidence One of the first empirical investigations into how sender detectability scores are influenced by personality traits was conducted by DePaulo and Rosenthal (1979). They sought to investigate whether people high in Machiavellianism were perceived to be more credible while lying compared to those low in Machiavellianism. In their study, participant-senders (N = 40) completed a scale designed to measure Machiavellianism and were recorded while they truthfully described someone they liked (L-truth) and someone they disliked (D-truth). They were also recorded while they described the person they liked as if they actually disliked them (D-lie) and the person they disliked as

44

Chapter 3: The influence of personality traits and social skills ______if they actually liked them (L-lie)17. The same group of participant-senders returned later to act as participant-receivers, rating the other participant-senders person descriptions on six nine-point scales of liking, ambivalence, deception, disliking, discrepancy, and tension. To index each participant-sender’s ability to deceive while controlling for their overall credibility (their general tendency to be perceived as credible), each participant- sender’s L-truth score (the mean rating of deception given to the message where they truthfully described someone they liked) was subtracted from their corresponding L-lie score (the mean rating of deception given to the message where they deceptively described someone they disliked as though they actually liked the person). The same calculation was applied to each participant-sender’s D-truth (the mean rating of deception given to the message where they truthfully described someone they disliked) and D-lie scores (the mean rating of deception given to the message where they deceptively described someone they liked as though they actually disliked the person). To operationalize Machiavellianism, the authors applied a median split to the Machiavellianism scale scores, with participant-senders scoring above the median classified as ‘High Machs’ and those scoring below the median classified as ‘Low Machs’. The authors reported that the ‘High Mach’ group were slightly more successful at getting away with their lies compared to the ‘Low Mach’ group. The authors also reported that the ‘High Mach’ group were more successful at getting away with their lies when they were pretending to dislike someone they actually liked compared to when they were pretending to like someone they actually disliked. These conclusions, however, are not necessarily supported by the data. Similar to the studies conducted by DePaulo and colleagues that sought to investigate the effects of sender motivation (see Chapter 2), the data analysis conducted by DePaulo and Rosenthal (1979) also used detectability scores as the dependant variable in analyses. As previously described, any particular difference across detectability scores can be caused by many different combinations of single cell credibility ratings, thus the results do not necessarily indicate that the deceptive messages of the ‘High Mach’ group were rated as less deceptive than those of the ‘Low

17 Participant-senders were also recoded while they described someone they felt ambivalent about and someone they felt indifferent about. These messages are not important for the purpose of the current examination thus they are not considered here. 45

Chapter 3: The influence of personality traits and social skills ______

Mach’ group, as the authors propose; rather they simply indicate that the difference between the two types of messages was greater for the ‘Low Mach’ group than it was for the ‘High Mach’ group. By taking the difference between the truthful and deceptive message means as the dependant variable the simple effects of Mach group at each level of message veracity cannot be estimated. This issue is further complicated by the presence of an interaction between the Mach group and type of affect factors (like message vs. dislike messages). To appropriately test the authors’ hypotheses/inferences the analysis should have been specified as a three-way interaction between Mach group, type of affect and message veracity, with the mean ratings of deception given to each message used as the dependant variable. While the results reported in DePaulo and Rosenthal (1979) are consistent with the hypothesized facilitative effect of Machiavellianism, albeit not exclusively, further supportive evidence comes from a study conducted by Geis and Moon (1981). In their study, participant-senders (N = 128) completed the Mach IV and V Scales (Christie & Geis, 1970) before being secretly recorded while either denying or admitting to a theft. At the outset of the experiment, participant-senders were randomly allocated to either a truth or lie condition and then placed into a group consisting of three same-sex confederates. One confederate was assigned to be the participant-sender’s partner while the other two were assigned to an opposing team. Both teams then played a prisoner’s dilemma game which involved the gain or loss of money. Each trial started with a planning period where the teams could discuss whether they would cooperate or compete. During each planning period the experimenter and the team consisting of the two confederates left the room to discuss their decision in private while the participant- senders team remained in the room. Once the teams had made their decisions the confederate team and the experimenter re-entered the room whereby the experimenter instructed both teams of the outcome, placing each team’s winnings in a separate pile on the table. For participant-senders allocated to the lie condition, during the planning period for the seventh trial the confederate partner took $4 out of the opposing team’s pile of money and put it in their pile, encouraging the participant to deny the theft if caught. When the confederate team returned to the room they accused the participant- senders team of taking the money. After a short debate between the confederates a direct question was posed to the participants-sender who either chose to deny or admit to the theft. The event was recorded by a hidden camera. The recordings were

46

Chapter 3: The influence of personality traits and social skills ______subsequently shown to an independent sample of participant-receivers (N = 32) who rated half of the recordings for veracity on a 6-point scale. The authors reported that while the truthful ‘High Mach’ group (those with Machiavellianism scores above the median) was perceived as equally as credible as the truthful ‘Low Mach’ group, the deceptive ‘High Mach’ group was perceived as more credible than the deceptive ‘Low Mach’ group. Furthermore, they reported that the difference between the truthful and deceptive ‘High Mach’ groups was significantly smaller than the difference between the truthful and deceptive ‘Low Mach’ groups. These findings are consistent with those reported by DePaulo and Rosenthal (1979) and provide further evidence that people with high levels of Machiavellianism tend to be better liars than those with low levels of Machiavellianism, irrespective of their general demeanour. While the facilitative effect of Machiavellianism is thought to be a by-product of their a-moral philosophy and therefore present in most deceptive situations, other personality traits are thought to only benefit deception under certain circumstances. Miller, deTurck, and Kalbfleisch (1983) argued that high self-monitors may be more aware of what successful deception entails compared to low self-monitors, thus they may be more likely to benefit from the opportunity to rehearse. This reasoning was based on previous research where high self-monitors had been shown to be better at detecting deception and preparing for deception tasks compared to low self-monitors. They went on to argue that the opportunity to rehearse may impair low self-monitors as they tend to lack confidence in their deceptive ability and are more inexperienced deceivers compared to high self-monitors, thus the time spent rehearsing may serve to heighten their arousal. To test their predictions, Miller, deTurck, and Kalbfleisch (1983) had participant-senders (N = 32) lie and tell the truth about the emotions they were experiencing while they were viewing pictures of pleasant landscapes and third-degree burn victims. The sample of participant-senders contained both high and low self- monitors. Half of the participant-senders viewed the pictures beforehand and were given 20-minutes to rehearse their messages while the other half had no knowledge of the pictures beforehand. The messages were recorded and later shown to four independent groups of participant-receivers (N = 151), with each group rating the messages from one of the four conditions for veracity (truth/lie). The authors reported that the group of

47

Chapter 3: The influence of personality traits and social skills ______participant-receivers who rated the rehearsed high self-monitors were less accurate than the other three groups. They also reported that the other three groups did not significantly differ in terms of accuracy. A particular limitation of the study conducted by Miller, deTurck, and Kalbfleisch (1983) pertains to the method of data analysis used. The authors assigned each group of participant-receivers to rate the messages from one of the four groups of participant-senders and then analysed the participant-receiver accuracy rates. This method ignores the random sampling error associated with participant-senders which may result in an increased probability of a Type 1 error, thus the authors inferences are restricted to the particular sample of high self-monitors used in the study (Watkins & Martire, in press; see Chapter 8). This limitation is particularly problematic given the small number of participant-senders (n = 4). While the hypothesized relationship between personality traits and the ability to deceive is grounded in the differential experiencing of emotional arousal, cognitive load, and/or behavioural control, Riggio and Friedman (1983) argued that some people may simply look more credible due to their physical characteristics or because they naturally tend to exhibit more behaviours associated with truthfulness. They argued that these natural tendencies may be related to personality traits and/or the social skills of the sender. To investigate this possibility, they had participant-senders (N = 63) complete four individual difference measures; two measures of nonverbal social skills (the Affective Communication Test and the Self-Monitoring Scale) and two measures of personality (the Personality Research Form and the Eysenck Personality Inventory). Participant-senders were then recorded while they described six pictures (three of the pictures were described truthfully while the other three were described deceptively) and while they attempted to convey six basic emotions while saying simple sentences or a portion of the alphabet. They were also recorded while explaining what they had just finished doing in the previous task. The recorded picture descriptions were later shown to an independent sample of participant-receivers (N = 176) who rated half the messages for veracity (true/false) in one of four conditions; the full face channel (the face with video and audio), video face channel (the face with video only), full body channel (the full body with video and audio) and video body channel (the body with video only). The recorded task descriptions were shown to another sample of independent participant-receivers who made similar ratings.

48

Chapter 3: The influence of personality traits and social skills ______

The behaviours that participant-senders displayed during each of their messages were quantified by having the recordings coded by trained observers. Sixteen of these coded behavioural cue scores were entered into a principal components factor analysis in an attempt to reduce the total number of cues (plausibility was omitted as it represented the only verbal cue score). Five distinct factors emerged from the analysis; facial animation, body contact, eye contact, gestural fluency and nervous behaviours. To address their first research question, two correlation matrices were examined; the first contained the correlations between the 25 individual difference scores and the 6 behavioural cues scores while the second contained the correlations between the individual difference scores and an index of behavioural change (the difference between the truthful and deceptive behavioural cues scores). Of the 300 correlations examined, 24 were reported as statistically significant at the .05 level. To address their second research question three additional correlation matrices were examined; the first contained the correlations between the individual difference scores and the overall judged perceptions of truthfulness, the second contained the correlations between the individual difference scores and the judged perceptions of truthfulness for the deceptive messages only while the third contained the partial correlations between the individual difference scores and the judged perceptions of truthfulness for the deceptive messages while adjusting for general demeanour ( scores for the ‘neutral interaction’). The relationships were examined within each of the 4 judging conditions. Of the 300 correlations examined, 20 were reported as statistically significant at the .05 level. These analyses, however, suffer from important limitations. The data analysis conducted by Riggio and Friedman (1983) was exploratory and resulted in a total of 600 correlation coefficients being estimated. Even if there were no actual relationships present in the population we would expect around 30 of these estimates to have p values below .05 simply due to random sampling error. While the number of statistically significant findings reported by Riggio and Friedman (1983) is slightly more than one would expect under the null hypothesis, supporting the hypothesis that the relationship between behavioural displays and receiver judgements of credibility may be moderated by the characteristics of the sender, it is difficult to ascertain which of these findings could be considered robust and likely to replicate with new samples.

49

Chapter 3: The influence of personality traits and social skills ______

To further investigate their hypotheses, Riggio and colleagues conducted a conceptual replication of the earlier study, the details of which are reported in three separate publications (Riggio, Salinas, & Tucker, 1988; Riggio, Tucker, & Throckmorton, 1987; Riggio, Tucker, & Widaman, 1987). They had participant-senders (N = 38) complete five measures of social skills (the Private and Public Self- Consciousness Scales, the Social Anxiety Scale, the Self-Monitoring Scale, the Affective Communication Test and a revised version of the Social Skills Inventory), three measures of personality (the 16 Personality Factors Questionnaire, the Marlow- Crowne Social Desirability Scale and the Bem Sex Role Inventory) and an opinion survey designed to measure their opinions about a wide array of socio-political issues. Approximately one week later, the participant-senders were recorded while they attempted to communicate six persuasive messages selected from the opinion survey; two where they originally agreed with the statement (truthful condition), two where they originally disagreed with the statement (deceptive condition) and two where they originally indicated that they had no opinion on the issue (neutral condition). The first four recordings of each participant-sender were later shown to an independent sample of participants-receivers (N = 18) where they rated how much they believed what the participant-sender was saying on a nine-point scale. The last two recordings of each participant-sender were shown to another independent sample of participants-receivers (N = 16) where they made similar ratings. To investigate whether the social skill scores were related to the believability ratings, a single correlation matrix was examined. Of the 60 correlations examined, 22 were reported as statistically significant, many more than would be expected under the null hypothesis. Specifically, senders with higher levels of social control and extraversion tended to be perceived as more believable for each type of message (truthful, deceptive and neutral). Emotional and social expressivity was also related to greater receiver ratings of believability for truthful and neutral messages, but not for deceptive messages. Senders with higher levels of social anxiety, on the other hand, tended to be perceived as less believable. The authors concluded that the results demonstrated that those who have a higher level of social skills tend to have an honest demeanour whereas those who are more socially anxious tend to have a dishonest demeanour.

50

Chapter 3: The influence of personality traits and social skills ______

To investigate whether the measures of personality were related to the believability ratings, another correlation matrix was examined. Of the 60 correlations examined, 21 were reported as statistically significant. Specifically, higher scores on the 16 Personality Factors Questionnaire scales of outgoing/warm, happy-go- lucky/impulsiveness/enthusiastic, venturesome/bold/energetic, self-controlled/self- discipline and motivational distortion were positively related to perceived believability in all conditions whereas lower scores on the scales of self-sufficiency and apprehensive/insecure/guilt were negatively related to perceived believability in all conditions. Finally, those scoring higher on the Marlow-Crowne Social Desirability Scale and the Bem Sex Role Inventory masculinity subscale tended to be perceived as more believable. To examine how the observed relationships between social skills and perceptions of credibility were mediated by certain behavioural cue displays, Riggio, Tucker, and Widaman (1987) had trained observers’ code the occurrence of nine behavioural cue displays. Similar to the procedures used in Riggio and Friedman (1983), the coded behavioural cue scores were entered into factor analyses in an attempt to reduce the number of cues. The analyses consistently indicated that smiling, laughing and changes in facial expressions loaded on a single factor, deemed emotional reactions. Furthermore, speech rate and speech segments were combined as they were considered indicators of a general verbal fluency factor. The remaining four cues, including head movements, eye contact, the use of ‘I’, and the use of ‘We’, were not combined and entered separately into subsequent analyses. To investigate the degree to which the coded behavioural cues mediated the relationships between social skills and deceptive ability, some of the scores on the individual differences measures, the coded behavioural cue scores, and the believability scores were entered into a series of structural equation models. Analyses were performed separately for the truth and deception conditions. The analyses revealed that those who had higher levels of social skills tended to have higher levels of verbal fluency, which in turn was associated with higher levels of believability in both the truthful and deceptive conditions. Those with higher levels of social skills also tended to use more words while including a fewer number of arguments in the truthful condition whereas they tended to have a higher number of “I” and “We” references in the deceptive condition. Higher levels of public self-consciousness were associated with

51

Chapter 3: The influence of personality traits and social skills ______more head movements in the truthful condition but fewer head movements in the deceptive condition, with more head movements associated with higher levels of believability in the deceptive condition only. Lastly, more emotional reactions were associated with lower levels of believability in the deceptive condition only. An important consideration of the analysis concerns the small number of participant-senders used in the study (N = 35). While the authors note that they employed less stringent significance testing to accommodate the small sample size, this may only somewhat lessen the probability of Type 2 errors at the expense of Type 1 errors. The relationships themselves are still subject to poor estimation (large standard errors), meaning that the precise magnitudes of the relationships are difficult to quantify. Furthermore, the small sample size means that a non-significant finding is not very informative as it does not indicate with a high level of certainty that there is no actual relationship in the population. Based on the previously reported findings, Keating and Heltman (1994) hypothesized that people with high levels of dominance may be more adept at nonverbal management and thus better liars than those with low levels of dominance. They argued that high levels of dominance emerge as a consequence of low levels of social anxiety, a quality that tends to co-occur in those with high levels of Machavellism and social skills. To test their prediction, they recruited two samples of participant-senders; a sample of preschool children (N = 49) and a sample of undergraduate psychology students (N = 61). The preschool children were given two drinks to taste; one sweet the other sour. They were then recorded while they described the taste of both drinks to an adult assistant after being asked to describe both drinks as tasting good. Dominance was assessed by having trained coders’ record the incidence of dominant behaviours during 60 minutes of free play time. An independent sample of participant-receivers (N = 228) rated the child messages for veracity. The authors reported that deception encoding scores (the proportion of participant-receivers who believed the child when they were lying) were positively related to the ratings of dominance. They also reported that this relationship was attenuated somewhat when truth encoding scores (the proportion of participant- receivers who believed the child when they were telling the truth) were already accounted for. The authors concluded that while the ability to be believed while lying did somewhat depend on the child’s general tendency to be believed (their demeanour),

52

Chapter 3: The influence of personality traits and social skills ______children who had high levels of dominance tended to be believed marginally more than those with low levels of dominance, irrespective of their general demeanour. Student encoding skill was assessed in a similar way to that of the children. Each student tasted the sweet and sour drinks and then answered some questions about which drink they liked better and why while being recorded. The students answered the questions two times; once truthfully the other deceptively. Dominance was assessed by allocating students to groups of six same-sex peers where they spent 30 minutes attempting to reach a consensus about a list of items that would be essential to survive a plane crash in frigid North America. After the 30 minute discussion the students individually ranked-ordered each of their peers in terms of how dominant and influential they were during the preceding discussion. The average ranking was taken as an index of each student’s dominance. An independent sample of participant-receivers (N = 98) then rated the student messages for veracity (truth/lie). The authors reported that for male students the relationship between deception encoding scores and dominance scores was similar to that observed among the children. They reported that while the ability to be believed while lying did somewhat depend on demeanour; male students who had high levels of dominance did tend to be believed more than those with low levels of dominance, irrespective of demeanour. This relationship was not observed among the female students. Building on the previous studies, Frank and Ekman (2004) sought to investigate the degree to which people’s ability to appear truthful is influenced by personality traits, personal values, and social skills, as well as whether it is influenced by the static features of a person’s face. They had participant-senders (all male; N = 20) complete a preliminary version of the Self-Report Psychopathy Scale - II, the Machiavellianism IV Scale, the Rokeach Value Survey, and the Affect Communication Test. The participant- senders were also photographed in a neutral facial expression. To assess each participant-sender’s ability to appear truthful, they completed two tasks. The first task involved the mock theft of money where each participant- sender was paired with a confederate and assigned to enter a room containing $50. The participant-sender either entered the room before or after the confederate. Those who entered the room first were given the choice whether or not to take the money whereas those who entered second had to take the money if it was there. After each participant- sender exited the room they were recorded while being interrogated about the theft. All

53

Chapter 3: The influence of personality traits and social skills ______participant-sender’s were instructed to deny that they had taken the money. The second task involved telling either a true or false opinion. All participant-sender’s were administered a questionnaire that assessed both the direction and magnitude of their opinions on social issues. The opinion that the participant felt most strongly about was selected for the interview. Participant-senders were given the option of lying or telling the truth. Of the 20 participant-senders who completed the experiment, eight lied in both conditions while seven told the truth in both. A portion of these 15 participant-senders’ crime interrogations were shown to an independent sample of participant-receivers (N = 49) where they were rated for veracity (truth/lie). A portion of their opinion interrogations were shown to another independent sample of participant-receivers (N = 54) where they made similar ratings. Furthermore, their photographs were shown to yet another independent sample of participant-receivers (N = 31) where they were also rated for veracity (truth/lie). The authors reported that while the truthfulness scores (the proportion of the participant-receivers who rated each participant-sender as telling the truth) in the mock theft condition were strongly related to those in the opinion condition, neither sets of scores were related to those in the still photo condition. The scores on the individual difference measures also showed few relations to the truthfulness scores. Participant- senders who rated courageousness as an important value on the Rokeach Value Survey tended to be perceived as less truthful across both conditions while those who rated cleanliness as an important value were seen as less truthful in the crime condition only. Finally, those who rated forgiveness as important were seen as more truthful in the crime condition only. The primary limitation of the study conducted by Frank and Ekman (2004) is the high number of correlation coefficients estimated and the small sample of participant- senders (N = 15). For the relationships between the individual difference measures and the truthfulness scores, 78 correlation coefficients were estimated, 4 of which were reported as statistically significant. This is no more than would be expected under the null hypothesis. Post hoc power analyses indicate that a sample size of 15 affords a .19 probability of detecting a medium effect (r = .3) and a .50 probability of detecting a large effect (r = .50), thus even if large relationships were present in the population their study was not well placed to detect them.

54

Chapter 3: The influence of personality traits and social skills ______

Summary Most of the research reviewed in this chapter has focused on individual differences in detectability scores. In an early investigation, DePaulo and Rosenthal (1979) demonstrated that the truths and lies produced by people with relatively high levels of Machiavellianism are harder to differentiate between than those produced by people with relatively low levels of Machiavellianism. This finding was subsequently extended by Geis and Moon (1981) who demonstrated that in addition to producing truths and lies that are harder to differentiate between, the lies produced by people with relatively high levels of Machiavellianism also tend to be rated as more truthful than the lies produced by people with relatively low levels of Machiavellianism, even though the truths produced by both groups were indistinguishable from one another. While Frank and Ekman (2004) failed to replicate these findings using both a mock theft and a false opinion paradigm, the small sample size considerably limits the extent to which the data actually supports the null hypothesis. Similar results have been observed when the role of Dominance, a factor closely related to Machiavellianism, was examined. In a sample of university students and a sample of children, Keating and Heltman (1994) demonstrated that the lies produced by people with relatively higher levels of dominance tended to be rated as more truthful than those with lower levels of dominance. This finding was observed even after controlling for the variability in truthful performances. With the exception of the study conducted by Frank and Ekman (2004), the studies mentioned above provide support for the hypothesis that the degree to which an individual is capable of regulating the underlying processes of deception (indexed through the measurement of personality variables) influences the degree to which their behaviours change during deception, and thus the degree to which their truths and lies may be correctly differentiated between by receivers. These studies, however, provide only indirect evidence for this hypothesis as they do not directly measure the degree to which each individual is capable of regulating the underlying processes of deception; rather they measure variables that are thought to co-vary with the differential regulation of these processes. For instance, according to Geis and Moon (1981), the superior deception performance observed in people with high levels of Machiavellianism may have been due to an increased ability to control the outward appearance of anxiety. Likewise, Keating and Heltman (1994) argued that the superior deception performance

55

Chapter 3: The influence of personality traits and social skills ______observed in people with high levels of Dominance may have been due to an increased ability to manage nonverbal behavioural displays. While the observed relationships between measures of personality/social skills and the ability to deceive are consistent with the hypothesized mechanism of action, more direct measures of sender characteristics may yield stronger findings. For instance, direct measures of a person’s ability to control the outward appearance of anxiety or to manage nonverbal behavioural displays may be more strongly associated with the ability to deceive than measures of Machiavellianism or Dominance, respectively. Although these direct measures may be difficult to operationalize, other aspects of the proposed mechanism may be more easily measured, such as a person’s ability to regulate cognitive load. While most of the research reviewed in this chapter supports the hypothesis that sender characteristics are related to the ability to deceive, the evidence regarding whether sender characteristics are related to a person’s overall credibility is more mixed. Specifically, whereas the earlier study conducted by Riggio and Friedman (1983) found little evidence of an association between personality traits/social skills and overall ratings of credibility, subsequent investigations demonstrated that expressive, socially tactful people tended to appear more honest whereas socially anxious people tended to appear more dishonest (Riggio, Salinas and Tucker, 1988; Riggio, Tucker and Throckmorton, 1987; Riggio, Tucker and Widaman, 1987). It is possible that these discrepant findings are due to the different deception tasks used in each study. Participant-senders in the former study were tasked with describing pictures while those in the later were tasked with providing opinion statements. It may be the case that the task of communicating impersonal factual information (describing pictures) was not sufficiently engaging to elicit a person’s natural behavioural flagrancies. It is important to point out that while the latter work conducted by Riggio and colleagues provides direct evidence in support of the idea that sender characteristics are related to behavioural displays and thus receiver judgements of credibility, the research does not provide information regarding the regulation of the underlying processes involved in deception. It simply identifies senders who tend to behave in ways that receivers associate with truthfulness. The research does not identify senders who appear more truthful once their overall credibility has been accounted for. One particular aspect of deception that has been overlooked in the research reviewed so far is the cognitive demands of deception. While the literature regarding the

56

Chapter 3: The influence of personality traits and social skills ______effect of sender motivation contends that higher levels of sender motivation are associated with higher levels of cognitive load, no evidence regarding this assertion has been put forward thus far. Furthermore, the literature regarding personality traits/social skills routinely attributes the observed effects to the differential experiencing of emotional arousal and/or the ability to control behaviours and gives little regard to a sender’s ability to regulate cognitive load. To examine the research regarding this aspect of deception the next chapter focuses on the cognition of deception.

57

Chapter 4: The cognition of deception ______

Chapter 4 The Cognition of Deception Several theories of deception include mechanisms whereby lying is thought to induce more cognitive load than telling the truth. In their formulation of the multi-factor theory of deception, Zuckerman et al. (1981) argued that deception induces more cognitive load than truth telling because liars must freely invent their stories and monitor them so that they are internally consistent and adhere to everything the receiver knows or might find out in the future. They went on to argue that because the degree to which a person experiences cognitive load influences the incidence and/or intensity of certain behaviours, the relatively high cognitive demands of deception would result in liars exhibiting more behavioural signs of cognitive load than truth-tellers. These behavioural signs of cognitive load were thought to include longer response latencies, greater pupil dilation, more speech errors, and fewer illustrators. More recently, Vrij, Fisher, Mann, and Leal (2006) described several aspects of lying that may contribute to the cognitive demands of deception. Liars (1) must formulate a coherent story; (2) are less likely to take their creditability for granted, thus are more likely to monitor and control their behaviours; (3) are more likely to monitor the interviewer’s reactions more carefully; (4) must remind themselves to act and role- play; (5) must suppress the truth since activating the truth happens habitually; and (6) must deliberately activate a lie. The authors are careful to point out, however, that “lying is more cognitively demanding to the degree that these six principles are in effect” (p. 4). For instance, truth suppression is only thought to contribute to the cognitive demands of deception if the truthful event is easily retrieved from memory. In their formulation of Interpersonal Deception Theory (IDT), Buller and Burgoon (1996) also proposed that deception generally induces more cognitive load than truth-telling. While they agreed that maintaining internal and external consistency would be harder during deception than during truth-telling, they placed a greater emphasis on the interactive and strategic nature of deception. They argued that to be successful during deception, senders must pay close attention to receiver cues of suspicion and continuously adapt their behaviours in response to such cues while simultaneously managing their emotions. According to IDT, it is the integration of these numerous tasks that makes deception more cognitively demanding than truth-telling. While Zuckerman et al.’s (1981) multi-factor model proposes that the increased 58

Chapter 4: The cognition of deception ______cognitive load experienced by liars is detrimental to deception performance, IDT asserts that the net gain of the additional cognitive processes may actually benefit deception performance (Burgoon & Floyd, 2000). That is, IDT asserts that while the strategic adaptation of expressive behaviours and the monitoring of receiver cues of suspicion may indeed increase cognitive load, these processes are beneficial to performance and serve to increase perceptions of credibility. While the above theories contend that the increased cognitive demands of deception stem from the additional tasks that liars must engage in to conceal their deception, Sporer and Schwandt (2006; 2007) argued that the increased cognitive demands of deception are caused by its heavier processing requirements. They based their model of deception on Baddeley’s (1992; 2000) theory of working memory, arguing that while truth-tellers simply retrieve and reconstruct an episodic or autobiographical event from memory, liars must either freely invent (using his or her imagination) and/or construct a coherent story based on script knowledge of comparable situations and events. They argued that this fundamental difference causes deception to place heavier demands on the working memory system, resulting in less capacity available for speech production. The Activation-Decision-Construction Model (ADCM; Walczyk, Roper, Seemann, & Humphrey, 2003) is a theoretical cognitive framework that asserts that three additional processes are required to tell a lie. First, a posed question automatically activates the truth in long-term memory, which is then transferred to working memory and made consciously available to the liar. The liar must then purposefully decide to lie and thus actively inhibit the truthful response in working memory. Finally, the sender must construct a context appropriate lie. In a more recent extension of the ADCM, Walczyk, Harris, Duck, and Mulat (2014) proposed a cognitive theory of serious deception. Their Activation-Decision- Construction-Action Theory (ADCAT) contends that deception and truth-telling both involve the same four basic underlying cognitive processes. The first component (activation) concerns the cued retrieval and transfer of truthful information into the episodic buffer of working memory. While this process is thought to usually occur automatically, Walczyk et al. (2014) argued that if the episodic or semantic truths are accessed infrequently or not recently, then the central executive may be required to search memory explicitly for them. They argued that such an explicit search would

59

Chapter 4: The cognition of deception ______require cognitive resources. The second component (decision) concerns the impromptu or a priori decision about the level of honesty/dishonest to be conveyed. They argued that the expected value of the truth is compared against the expected value of one or more deceptions. This comparison of expected values is thought to require several factors (the sender goals, the social context, and the truth if accessible) to be active in working memory. The central executive is also thought to inhibit information units that are inconsistent with the sender goals. The third component (construction) concerns the manipulation of information to “falsify, equivocate, omit, exaggerate, or understate or the recall of a prepared deception then adjusted for the social context” (p. 24). The extent to which each of these strategies is used is thought to influence the cognitive load imposed by message construction. The fourth and final component (action) concerns the actual delivery of the message to the receiver. It is important to point out that the ADCAT does not predict that deception is intrinsically more cognitively demandingly than truth-telling. Specifically, the theory predicts that whichever message (truthful or deceptive) requires (1) a more complex response (e.g. constructed, edited, not simply recalled), (2) a more spontaneous (unrehearsed) or novel response, and/or (3) more explicit memory searching (either to retrieve a truth or to find information useful for lie construction), will impose greater cognitive load. While ADCAT was proposed as a cognitive model of high stakes deception, other models of deception are more relevant to the normal everyday lies that people tell. Information Manipulation Theory 2 (IMT2; McCornack, Morrison, Esther Paik, Wisner, Zhu, 2014) argues that deception involves the covert manipulation of information along multiple dimensions and that it is a contextual problem-solving activity driven by the desire to efficiently minimise current-state/end-state discrepancies. IMT2 predicts that deception causes more cognitive load than truth telling when four conditions hold; (1) when the form of deception being produced is a bold-faced lie, (2) when the truthful information one possesses is contextually unproblematic, (3) when the truthful information is easily accessible within memory, and (4) when any false information that might be used to construct the bold-faced lie must be retrieved from long-term memory. Importantly, McCornack et al. (2014) argue that senders are guided by a principle of least effort and that they will choose the most efficient solution (truth or deception) that is predicted to achieve their end-state goal (a credible perception). As a result, they

60

Chapter 4: The cognition of deception ______contended that when the above four conditions hold, senders will elect to tell the truth rather than deceive as it is less cognitively demanding than deception and contextually unproblematic. When the truth is contextually problematic, however, IMT2 predicts that deception will be less cognitively demanding than the truth, thus the sender will elect to lie. While the above prediction inverts the prevalent theoretical relationship between cognitive load and message veracity, IMT2 compares the relative cognitive loads of truthful and deceptive messages within-sender. That is, the relative cognitive loads of the different message types are considered against a single set of contextual restraints. While this perspective is novel, most theories regarding the cognition of deception compare the relative cognitive loads of truthful and deceptive messages between- senders. That is, most theories contrast the levels of cognitive load experienced by liars with those experienced by truth-tellers. Importantly, these two groups usually have different contextual restraints (the truth is problematic for liars and unproblematic for truth-tellers). This means that while each type of sender may be choosing the most efficient solution to minimise current-state/end-state discrepancies, liars may still experience more cognitive load than truth-tellers. For instance, consider two suspects that are being interviewed about their whereabouts at the time of a recent murder. One of the suspects is guilty while the other is innocent. When the suspect is innocent, the truth is contextually unproblematic (they were not in the vicinity at the time of the crime). The innocent suspect can therefore freely disclose their whereabouts at the time of the crime. Lying about their whereabouts would increase the cognitive demands of the interview and would therefore be a less efficient solution to achieve the desired outcome (credibility). When the suspect is guilty, however, the truth is contextually problematic (they were in the vicinity at the time of the crime, thus there is no feasible way to disclose the truth while maintaining a perception of credibility). The guilty suspect must therefore falsify their whereabouts at the time of the crime. In this instance, telling the truth would be a less efficient solution to achieve the desired outcome. While each suspect (guilty vs. innocent) chooses the message (truthful vs. deceptive) that best minimises current-state/end-state discrepancies, the guilty suspect’s message (deceptive) will induce more cognitive load than the innocent suspect’s message (truthful). This perspective (between-senders) is more applicable to the forensic contexts where deception detection is important.

61

Chapter 4: The cognition of deception ______

The Cognition of Deception: The Empirical Evidence There is a wealth of empirical evidence supporting the idea that deception is more cognitively demanding than truth telling. In experimental deception tasks, deceptive conditions are often rated by participants as more cognitively demanding than truthful conditions. This has been observed across participants as well as within participants (Caso, Gnisci, Vrij, & Mann, 2005; Gozna, Vrij, & Bull, 2001; Vrij, Edward, & Bull, 2001; Vrij & Mann, 2006; Vrij, Mann, & Fisher, 2006; Vrij, Semin, & Bull, 1996; Wright, Berry, & Bird, 2012). There is also a large amount of empirical support for the idea that levels of cognitive load influence the incidence and/or intensity of certain behaviours. When cognitive load has been experimentally manipulated, high load conditions often illicit fewer eye blinks, greater pupil dilation, longer pauses during speech, and more gaze aversion than low load conditions (Bagley & Manelis, 1979; Bauer, Strock, Goldstein, Kahneman & Beatty, 1966; Doherty-Sneddon, Bruce, Bonner, Longbotham, & Doyle, 2002; Goldman-Eisler, 1968; Wallbot & Scherer, 1991). Given these findings it is unsurprising that experimental manipulations of veracity also produce similar behavioural indices associated with high levels of cognitive load. Previous meta-analyses of laboratory based deception studies have shown that liars tend to have a higher vocal pitch, more vocal tension, longer response latencies, more speech errors and shorter messages than truth-tellers (DePaulo et al., 2003; Sporer & Schwandt, 2006; Zuckerman et al., 1981). These relationships, however, tend to be weak and heterogeneous, suggesting that they likely depend on important key moderator variables, such as the type of message, the amount of preparation, planning and/or rehearsal. The presence of cognitive cues during deception has also been observed outside the laboratory. In a field investigation, Mann, Vrij, and Bull (2002) examined the behaviours of sixteen suspects undergoing high stakes police interviews. While large individual differences were present, two significant differences emerged; suspects made longer pauses and blinked less frequently while lying – both signs of increased cognitive load. In a follow-up study, Mann and Vrij (2006) showed a selection of these recordings to police officers and asked them to rate the degree to which they thought each interviewee was experiencing cognitive load. The results indicated that the liars were perceived to be thinking harder than the truth-tellers; a finding that has also been observed in laboratory based investigations (Vrij, Edward, & Bull, 2001).

62

Chapter 4: The cognition of deception ______

At the theoretical level the relationship between deception and behavioural indicators of cognitive load is causal. Most of the empirical evidence, however, is only correlational in nature. Notable exceptions are studies that attempt to manipulate the cognitive demands associated with self-presentations. Vrij et al. (2006) argued that by increasing the cognitive demands of an interview, behavioural indicators of cognitive load may be more prevalent during deception, thus allowing receivers to better discern truth from lie. In one of the first experiments to use such a technique, Vrij et al. (2008) had liars and truth-tellers (N = 80) either recall the components of a staged event in normal chronological order or in reverse chronological order. The authors then coded the incidence of cognitive cues, reporting that the liars in the reverse order condition had fewer auditory details, fewer contextual embeddings, more speech hesitations, a slower speech rate, more leg/foot movements, more cognitive operations, more speech errors and more eye blinks compared to the truth-tellers. The liars who told their stories in a normal chronological order, however, only had fewer hand/finger movements compared to the truth-tellers. In addition to investigating the behavioural differences between the two groups, Vrij et al. (2008) also had a group of participant-receivers (N = 55) rate the messages for veracity. The results indicated that the group of participant-receivers that rated the messages produced in the reverse order condition were more accurate than the group who rated the message produced in the normal order condition. These results have since been replicated when participant-senders lied in reverse order about a route they took (Vrij, Leal, Mann, & Fisher, 2012), when they were instructed to maintain eye-contact with an interviewer (Vrij, Mann, Leal, & Fisher, 2010), and when they lied to unanticipated questions (Vrij et al., 2009). These studies are particularly noteworthy because they demonstrate that the cognitive demands associated with deception may often exceed a person’s ability to process the demands without producing observable signs of cognitive strain. This is evident by the fact that an increase in task demands is associated with a decrease in task performance (more cognitive cues). They also demonstrate that lay observers correctly associate cognitive cues with deception, resulting in decreased perceptions of credibility for deceptive messages that are produced under conditions of high cognitive load. In other words, the cognitive manipulations increased the diagnosticity of behavioural cues. Importantly, the effects in these studies appear to be restricted to deceptive

63

Chapter 4: The cognition of deception ______messages, suggesting that the cognitive demands associated with truth telling are not normally sufficient to cause cognitive cues or that the experimental manipulations only induced higher cognitive load in the liars.

From Cognitively Demanding to Executively Demanding: Examining the Neural Mechanisms Underlying Deception While the results from behavioural analyses support the notion that deception is more cognitively demanding than telling the truth and that the cognitive demands of deception often exceed an individual’s ability to process the demands without also producing observable signs of cognitive load, the notion of ‘cognitive demand’ is fairly general and not well defined. Neuroimaging studies have provided useful insights into defining the cognitive demands of deception by identifying certain brain areas that appear to be associated with deception. In one of the first studies to apply these techniques to a deceptive act, Spence, Farrow, Herford, Wilkinson, Zheng, and Woodruff (2001) had participant-senders (N = 10) lie and tell the truth about certain actions they had done earlier that day, via a ‘yes’ or ‘no’ button push, while researchers took functional scans of their brains. They reported that deceptive button pushes took longer to initiate compared to truthful button pushes and that the deceptive button pushes were associated with greater neural activity in the ventrolateral prefrontal cortices - an area of the brain that has been associated with inhibitory control and response reversal. Subsequent neuroimaging studies have implicated other regions of the prefrontal cortex (Abe, 2009; Abe, 2011; Gombo, 2006; Johnson et al., 2004; Langleben et al., 2005; Spence et al., 2004). These regions are widely thought to be responsible for executive control processes. While the results from neuroimaging studies suggest that executive control processes are fundamental to the production and execution of a deceptive message, the precise patterns of neural activation vary considerably across studies. This suggests that deception is a complex cognitive task and that there is no unique set of cognitive processes associated with deception. Ganis, Kosslyn, Stose, Thompson, and Yurgelun- Todd (2003) argued that the somewhat inconsistent findings may have been due to the differences between experimental paradigms. They argued that lies can vary along many dimensions and that different types of lies would be associated with different patterns of neural activation. To investigate this prediction they scanned participant-senders (N =

64

Chapter 4: The cognition of deception ______

10) while they told spontaneous isolated lies and memorized lies that fit into a scenario. They reported that while both types of lies were associated with more neural activation than truthful responses, compared to the memorized lies that fit into a scenario, the spontaneous isolated lies were associated with more activation in the anterior cingulate, extending into the left premotor cortex, the left precentral gyrus, the right precentral/postcentral gyrus and the right cuneus. In contrast, the memorized lies that fit into a scenario were only associated with more activation in the right anterior middle frontal gyrus compared to the spontaneous isolated lies. These results demonstrate that while different patterns of brain activity arise when people tell lies than when they tell the truth, different types of lies depend on different cognitive processes. Another particularly noteworthy study regarding the neural correlates of deception is a study conducted by Christ, Van Essen, Watson, Brubaker, and McDermott (2009). They sought to investigate how somewhat distinct executive control processes were related to deception by further delineating the brain regions that were associated with deception. They hypothesized that working memory may be important to deception as it is necessary to keep the truth in mind while formulating a deceptive response. They also hypothesized that inhibitory control may be important to deception as it is involved in suppressing a truthful response, while set shifting may be important as it is involved in switching between truthful and deceptive responses. To investigate how the distinct executive control processes were involved in deception they utilized activation likelihood estimates (ALE) to identify brain regions that were consistently more active during deceptive responses than during truthful responses and contrasted the ALE maps to those generated for the three different aspects of executive control. Their analysis revealed that deception related activity showed isolated overlap with regions of the brain implicated in working memory, but not inhibitory control or set- shifting. The authors concluded that while the latter executive control processes may play a role in deception, the recruitment of brain regions associated with working memory suggests that the working memory system may play a particularly important role in deception. The studies investigating the neural correlates of deception are important because they provide detailed information about the regions of the brain that are more active during deception than during truth telling and about the psychological processes associated with these regions (Vrij, 2008). While these studies point to the executive

65

Chapter 4: The cognition of deception ______control system as important to deception, they do not provide evidence regarding the causal relevance or functional contribution of this system to the behavioural correlates of deception. Priori et al. (2008) sought to investigate this connection by recording the time it took participant-senders (N = 15) to truthfully and deceptively answer questions about pictures they were carrying both before and after they had the activity in their prefrontal regions modulated by transcranial direct current stimulation (tDCS). The authors reported that while the tDCS did not affect the time it took participant-senders to initiate truthful responses, it did increase the time it took them to initiate deceptive responses. This effect, however, was only observed when they lied about the pictures they actually had and not when they lied about pictures they did not have. The authors concluded that the lies about the pictures they did not have were more complex lies and that they may have involved the recruitment of other cortical areas above and beyond the attentional control exerted by the dorsolateral PFC. While the results reported by Priori et al. (2008) suggest that the inhibition of the dorsolateral PFC results in impaired deceptive behaviour, other studies employing tDCS have reported facilitative effects on reaction time for lies when modulating this area (Mameli et al., 2010). Furthermore, modulating the anterior prefrontal cortex has produced facilitative effects on reaction times for lies, as well as a decrease in sympathetic skin-conductance and feelings of guilt while deceiving (Karim et al., 2010). These seemingly conflicting results suggest that the functional contributions of executive control processes to the behavioural correlates of deception is complex and that while certain neural regions may be required to formulate and execute a deceptive response, other regions may be activated as a consequence of deception and associated with the emotional sequela of deception. The research reviewed above provides strong evidence for the hypothesis that deception is generally more cognitively demanding than telling the truth and that the integration of the complex array of tasks required for successful deception is carried out by brain regions associated with executive control processes. Central capacity theories predict that task performance depends on an interaction between a person’s cognitive capacity and the cognitive demands of the task (Kahnerman, 1973). It stands to reason that individual differences in cognitive capacity would be related to individual differences in task performance when the demands of the task are held constant. The following section reviews the empirical evidence for this hypothesis.

66

Chapter 4: The cognition of deception ______

The Link between Cognitive Abilities and Deception Performance Few studies have examined how an individual’s cognitive capacity affects their deception performance. This question was first investigated by Morgan, LeSage, and Kosslyn (2009). They sought to investigate whether an individual’s performance on a computer-administered deception task was related to their cognitive abilities. To investigate this question they had participants (N = 39) complete the MiniCog Rapid Assessment Battery (MRAB) and the short form of the Raven’s Advanced Progressive Matrices (APM). This battery purportedly measures three types of attention (selective, divided and vigilance), two types of working memory (verbal and spatial), three types of reasoning (three-term series verbal deduction, spatial mental rotation, and set- switching) and simple perceptual motor reaction time. The deception task consisted of participants writing two short summaries; one regarding their most memorable work experience, the other regarding their most memorable vacation experience. Half the participants were then instructed to invent a work experience that never happened but could have happened while the other participants were instructed to invent a vacation experience that never happened but could have happened. All participants were then provided with a transcript of their invented experience and instructed to memorize the details. Two sets of questions were then created; one regarding their work experience, the other regarding their vacation experience. Each set of questions was presented to participants twice, with participants instructed to answer the first presentation of the set of questions deceptively and the second presentation truthfully. The time it took participants to respond to each question (initial reaction time; IRT) was recorded. Each participant’s deception performance was indexed by subtracting their mean IRT for truthful responses from their mean IRT for deceptive responses. The authors reported that deception performance in the memorized condition (where the participants responded to the deception trials with the previously memorized details of the experience that never happened) was related to a different subset of MRAB scores than deception performance in the spontaneous condition (where they responded to deception trials with freely invented responses). Specifically, deception performance in the memorized condition was negatively related to the reaction times for the MRAB mental rotation and spatial working memory tasks and the error rate for the MRAB cognitive set switching task, where as it was positively related to the reaction

67

Chapter 4: The cognition of deception ______times for the MRAB vigilance task. In contrast, deception performance in the spontaneous condition was negatively related to the reactions times for the MRAB spatial working memory and filtering tasks, but positively related to the reaction times for the MRAB verbal working memory task and the error rates for the MRAB vigilance task. In a second experiment, Morgan, LeSage, and Kosslyn (2009) had participants (N = 42) tell the truth and lie spontaneously to questions about themselves as well as to questions about George W. Bush. The general procedure was the same as that used in their first experiment, except that each participant-sender’s truthful responses to all the questions were obtained after the deception task. The authors reported that when the questions were about themselves, deception performance was positively related to APM scores and the reaction times for the MRAB spatial working memory task. When the questions were about George W. Bush, however, deception performance was negatively related to APM scores and the error rates on the MRAB tasks of mental rotation and spatial working memory and positively related to the reaction times on the MRAB tasks of spatial working memory and vigilance. The authors concluded that the results support the idea that different types of lies arise from distinct cognitive processes and may cause different areas of the brain to be accessed. A particular limitation of this study is that deception performance was operationalized in a very restricted way (participant-senders answered each question with a single word with the difference between truthful and deceptive reaction times taken as the dependant variable). While the authors argued that IRT differences are indicative of greater demands upon an individual’s central processing capacity, this is not the same as the behavioural changes that a person manifests during a more complex act of deception. A further limitation is that the cognitive battery used in the experiments was quite broad and did not focus on the particular cognitive processes implicated by the neuroimaging research. In an attempt to generate more specific predictions regarding the role of individual differences in executive control processes in relation to deception performance, Visu-Petra, Miclea, and Visu-Petra (2012) argued that a tripartite model of executive control processes (working memory, inhibitory control and set shifting) may better account for the differential involvement of distinct executive processes during deception. They hypothesized that while measures of inhibitory control and set shifting

68

Chapter 4: The cognition of deception ______would be positively related to deception performance; enhanced working memory may interfere with the lying process as a stronger activation for the truth (presumed to be related to working memory skills) might undermine its subsequent inhibition during deception. To test their predictions, Visu-Petra et al. (2012) had participant-senders (N = 44) complete four tasks; each designed to measure a particular aspect of their executive abilities. Half of the participant-senders were then instructed to commit a mock crime where they interacted with six critical items while the other participant-senders did not commit the mock crime. All participant-senders then completed a reaction time based concealed information test (CIT). The test items were pictures that belonged to three categories; probes (the six critical items from the mock crime), targets (to be detected items, also from the same category as the probes) and irrelevants (four for each probe; items from the same category as the probes, not previously encountered during the experiment). Participant-senders were instructed to respond yes to target items and no to all other items. The authors reported that among the guilty participant-senders (those who completed the mock crime and were exposed to the critical items), those with better inhibition skills were marginally faster to respond to probe items (the six critical items from the mock crime) than those with poorer inhibition skills. They also reported that guilty participants with better set-shifting skills were better able to correctly reject the probe and irrelevant items than those with poorer set shifting skills. The guilty participant-senders with better spatial working memory, however, tended to be slower to respond to all three categories of items, but more accurate in identifying target items. The authors concluded that the results supported the hypothesis that executive control processes play an essential role in a person’s ability to accurately execute deceptive responses. They went on to state that although deception performance was positively related to cognitive set shifting and inhibition skills, there was a negative relationship between working memory skills and deception speed. Finally, they concluded that because no significant associations were observed between individual differences in executive abilities and deception difference scores (probes minus irrelevants), the aforementioned relationships are not uniquely related to deception, but rather affect performance on the CIT as a whole.

69

Chapter 4: The cognition of deception ______

The authors’ conclusions regarding the null findings with regard to the deception difference scores are perplexing. The results reported in the article indicate that among the guilty participant-senders, those with better set shifting skills tended to show less discrimination between probe and irrelevant items (the differences between the correct rejection rate for probe items and the correct rejection rate for irrelevant items was smaller) compared to those with poorer set shifting skills. This result suggests that those with better set shifting skills were not only better at correctly rejecting probe and irrelevant items, but also better at correctly rejecting the probe items relative to the irrelevant items. This means that their responses would more closely approximate those of an innocent person. While these studies demonstrate that the speed with which a person can initiate a deceptive response, relative to the speed with which they can initiate a truthful response, is related to their executive abilities, it is unclear how more complex forms of deception, and the behaviours and credibility assessments associated with them, are related to executive abilities. To date, only one study has investigated how individual differences in cognitive abilities relate to estimates of credibility and detectability. Wright, Berry, and Bird (2012) had participants (N = 51) compete in a social interactive deception task designed to simultaneously assess both a person’s ability to encode deception and their ability to decode deception. Participants were seated in small groups while they completed an opinion survey. At the start of each trial the researcher then gave one of the participants a cue card indicating which topic from the opinion survey they were to communicate to the other participants and whether they should provide their truthful opinion or a false opinion. At the end of each trial the other participants rated whether they thought the opinion was true or false. Importantly, a sub-set of the participants also completed the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999). The WASI was designed to provide an abbreviated measure of intellectual functioning (Full Scale IQ) and consists of four subtests (block design, vocabulary, similarities and matrix reasoning). The authors reported that the credibility and detectability measures were not significantly related to the WASI scores. They concluded that “the data presented here merely indicate that variance in deceptive performance is not a consequence of IQ or emotional ability” (p. 6). Their acceptance of the null hypothesis, however, may not be warranted. A retrospective power analysis indicates that their study had a .38

70

Chapter 4: The cognition of deception ______probability of detecting a medium size bivariate correlation (r = 0.3) between deceptive ability and general intelligence. This means that, although there was no evidence of an association, the studies sample was not particularly well placed to find one if it truly exists in the population. Furthermore, their study did not rely on theories of deception, which posit higher-order executive control processes, rather than general intelligence, as crucial to deception.

Good Liars and Poor Liars: Levels of Measurement and Central Capacity Theories Implicit in all the aforementioned cognitive theories of deception is the idea that the cognitive demands of deception often exceed the liar’s ability to process the demands efficiently, resulting in a state of cognitive overload. This cognitive overload is what is thought to give rise to behavioural indicators of cognitive strain. Central capacity theories hold that cognitive processes draw from a finite pool of cognitive resources (Kahnerman, 1973). If the amount of available resources exceeds the demands of the task then task performance will be at ceiling (perfect performance). In this instance, an increase in the amount of available resources does not result in an increase in task performance. In contrast, if the demands of the task exceed the amount of available resources then task performance will be below ceiling. In this instance, an increase in the amount of available resources does result in an increase in task performance, with task performance related to the amount of available resources. Whenever an increase in the amount of available resources results in an increase in task performance the task is said to be resource-limited whereas whenever task performance is independent of the amount of available resources the task is said to be data-limited (Norman & Bobrow, 1975). In the context of deception, task performance can be measured at several different levels - the most common being: reaction times, the incidence and/or intensity of behavioural cues, and/or perceptions of credibility. While reaction time is commonly taken as an indicator of cognitive load, with longer lag times indicating a greater processing requirement (more cognitive load), the incidence and/or intensity of behavioural cues and receiver perceptions of credibility are higher order measures of deception performance. This has important implications for the previous research investigating how individual differences in cognitive abilities relate to deception

71

Chapter 4: The cognition of deception ______performance because most of this research has used reaction time measures to operationalize deception performance. While reaction time measures provide useful insights into the amount and type of cognitive processing required during specific instances of deception, the relationships observed at one level of measurement do not necessarily generalize to other levels of measurement. This is because the higher order measures of deception performance (the incidence and/or intensity of behavioural cues and receiver perceptions of credibility) are mediated by additional factors. For instance, while senders with superior working memory abilities may take longer to initiate deceptive responses than those with poor working memory abilities, their deceptive responses may be more detailed and have greater structure than the responses produced by those with poor working memory abilities. Moreover, receivers may perceive more detailed and structured stories as more truthful, resulting in a positive relationship between working memory abilities and receiver perceptions of credibility. Alternatively, receivers may think that more detailed and structured stories indicate a greater level of rehearsal, resulting in a negative relationship between working memory abilities and receiver perceptions of credibility. Ultimately, the ability to deceive depends on the level at which task performance is measured. To date there has been no research investigating whether the executive demands of deception are sufficient to influence receiver perceptions of credibility in a veracity judgement task.

General Summary While recent research has demonstrated that the outcome of a veracity judgement depends more on the characteristics of the sender than on the characteristics of the receiver (Bond & DePaulo, 2008; Levine et al., 2011), only a few studies have attempted to identify the individual difference factors that contribute to the ability to deceive. Of the research that has investigated individual difference factors, several studies suffer from methodological limitations that restrict the conclusions that can be drawn from their data (the studies conducted by Burgoon & Floyd, 2000; Krauss, 1981; Miller et al., 1983; Riggio & Friedman, 1983) whereas several other studies did not operationalize sender performance in a manner that provides insight into the underlying processes of deception (the studies conducted by Riggio et al., 1988; Riggio et al., 1987; Riggio et al., 1987). While the empirical evidence from the above mentioned studies is limited, the results from the series of studies conducted by DePaulo and colleagues

72

Chapter 4: The cognition of deception ______

(DePaulo et al., 1983; DePaulo et al., 1985; DePaulo et al., 1988; DePaulo et al., 1991) suggest that the degree to which senders engage in behavioural control and/or experience/regulate emotional arousal may indeed influence receiver perceptions of credibility. Furthermore, the research investigating individual differences in personality traits/social skills has shown that senders who are better at regulating the outward expression of anxiety and nonverbal management tend to have superior deceptive abilities compared to their less skilled counterparts (DePaulo & Rosenthal, 1970; Geis & Moon, 1981; Keating & Heltman, 1994). Importantly, while there is evidence to support the idea that the ability to deceive is influenced by the experience/regulation of emotional arousal and by the ability to control/match expressive behaviours to the beliefs of receivers, a largely overlooked factor in the individual differences literature concerns sender cognitive abilities. The research regarding the cognition of deception provides strong evidence that deception is more cognitively demanding than truth-telling. Not only do liars report higher levels of cognitive load than truth-tellers (Caso et al., 2005; Gozna et al., 2001; Vrij et al., 2001; Vrij & Mann, 2006; Vrij et al., 2006; Vrij et al., 1996; Wright et al., 2012), but they also tend to manifest more of the behavioural signs of cognitive load compared to those who simply tell the truth (DePaulo et al., 2003; Sporer & Schwandt, 2006; Zuckerman et al., 1981). Furthermore, liars who produce messages under cognitively taxing conditions tend to manifest more behavioural signs of cognitive load than liars who produce messages under less cognitively taxing conditions (Vrij et al., 2008; Vrij et al., 2009; Vrij et al., 2010; Vrij et al. 2012). Moreover, receivers appear to be sensitive to the behavioural differences between these two groups of senders, with the receivers who judged the messages that were produced under cognitively taxing conditions obtaining higher accuracy rates than those judging the messages that were produced under less cognitively taxing conditions. These findings suggest that under the right circumstances, the cognitive demands of deception are sufficient to surpass a sender’s ability to regulate/process the demands without observable signs of cognitive strain and that receivers tend to be sensitive to these signs and correctly associate them with deception. Further evidence supporting the idea that deception is more cognitively demanding than truth-telling comes from neuroimaging research. This research has demonstrated that when people deceive their brains tend to exhibit more neural

73

Chapter 4: The cognition of deception ______activation than when they tell the truth (Abe, 2009; Abe, 2011; Gombos, 2006; Johnson et al., 2004; Langleben et al., 2005; Spence et al., 2004). Importantly, this increased neural activation is usually observed in the regions of the brain that are commonly associated with executive control processes (the prefrontal areas). Furthermore, research has shown that the modulation of neural activity in these regions influences deceptive behaviour (Karim et al., 2010; Mameli et al., 2010; Priori et al., 2008). Finally, individual differences in executive abilities have been shown to predict the speed with which a deceptive response is initiated (Morgan et al., 2009; Visu-Petra et al., 2012). Together these studies suggest that the cognitive demands of deception may be better specified as executive demands. While the above-mentioned research provides strong evidence for the role executive control processes play in deception, what remains unclear is whether the executive demands of deception are causally relevant to the diagnosticity of behavioural cues. Specifically, there has been no research investigating whether the executive demands of deception are sufficient to surpass the executive systems capabilities for processing the demand without observable signs of cognitive strain. If senders with superior executive abilities are better liars than those with poor executive abilities (when deception performance is measured at the level of receiver perceptions of credibility), the executive demands of deception are clearly causally relevant to the diagnosticity of behavioural cues. This is an important area of research because it has the potential to refine current cognitive models of deception by identifying which executive abilities (if any) are more taxed during deception. This area of research may also be of benefit as identifying the boundary conditions under which the executive demands of deception may be successfully processed without observable signs of cognitive strain may provide insights regarding the theoretical mechanism of action behind recent cognitive approaches to deception detection. Finally, these insights may be useful in developing more targeted and efficient cognitive interventions for deception detection and/or assist with matching interview protocols to the skills of the interviewee.

74

Chapter 4: The cognition of deception ______

Thesis Aims To bridge the gap between the research on the cognition of deception (the neuroimaging research and the research which uses specialized tools with reaction time based measures of task performance) and the research investigating individual difference factors (the research regarding the behavioural correlates of deception and the research regarding perceptions of credibility), this thesis aims to:

1. Provide the first empirical tests assessing whether the executive demands of deception are sufficient to influence sender behavioural displays, and thus receiver perceptions of credibility (whether the executive demands of deception are casually relevant to the diagnosticity of behavioural cues).

1.1.Take into account the cognitive costs associated with impression management and the costs associated with message production.

2. Identify which aspects of executive control (working memory updating, inhibitory control, or set shifting) contribute to the ability to deceive when deception performance is measured at the level of receiver perceptions of credibility.

75

Chapter 5: Investigating the executive demands of deception ______

SECTION 3: LABORATORY STUDIES Chapter 5 Study 1 - Investigating the Executive Demands of Deception According to central capacity theories, an individual’s specific cognitive strengths and weaknesses should predict his or her performance on various complex tasks that draw on these abilities (Kahnerman, 1973). For example, performance on a simple inhibitory control task should predict performance on a more complicated procedure that also requires inhibitory control resources, provided the outcome measures from both tasks are resource-limited (Norman & Bobrow, 1975). In the context of deception, it follows that performance on simple executive control tasks should predict performance on more complex deception tasks that also require similar executive resources, provided the executive demands of the deception task are sufficient to cause observable signs of cognitive strain, and thus influence perceptions of credibility (the deception task is resource-limited when it is measured at the level of receiver ratings of credibility). This hypothesis is the focus of the current study. In order to investigate whether the executive demands of deception are sufficient to influence the diagnosticity of sender behavioural displays, and thus receiver perceptions of credibility, the current study draws from Miyake et al.’s (2000) model of executive control processes. Miyake and colleagues used a latent variable approach to analyse data from a large sample of young adults who had completed several complex cognitive tasks. They reported that the outcome measures from the tasks were best predicted by a model that included three somewhat distinct, yet interrelated components. These components were characterized as: (1) working memory updating, (2) inhibitory control, and (3) set shifting. Based on the research regarding the cognition of deception, these three executive control processes may be important to the production and successful execution of a deceptive message because: (1) working memory updating is necessary to manage and arrange information into a coherent narrative, (2) inhibitory control is necessary to suppress truthful information, and (3) set shifting is necessary to switch between truthful and deceptive information in memory.

76

Chapter 5: Investigating the executive demands of deception ______

General Overview and Research Questions The current three-part study uses two distinct samples of participants: (1) a sample of participant-senders and (2) a sample of participant-receivers. In part one of the study, each participant-sender was video and audio recorded while producing six messages regarding controversial socio/political issues. Of these six messages, two were produced under conditions where the participant-senders were unaware that their messages were going to be evaluated for veracity and where they were given no instructions on how they should behave during the interviews (naïve truth condition), two were produced under conditions where the participant-senders were informed that their messages were going to be evaluated for veracity and where they were instructed to try to appear as credible as possible (informed truth condition), and two were produced under conditions where the participant-senders were instructed to provide false opinions while trying to appear as credible as possible (deceptive condition). After each statement, participant-senders completed three self-report measures. These measures were designed to assess how much emotional activation, cognitive load, and behavioural control each participant-sender experienced during the preceding interview. A false opinion paradigm (truths and lies about opinions/beliefs) was chosen because determining the veracity of opinions/beliefs can be important in security settings. Leal, Vrij, Mann, and Fisher (2010) described a case example where seven Central Intelligence Agency (CIA) agents were killed in a suicide attack by a man they thought was going to give them information about Taliban and Al-Qaeda targets. While the agents were aware that the man had expressed anti-American views, they incorrectly determined that the expressed views were part of the man’s cover. This incorrect veracity judgement contributed to the deaths of the CIA agents, highlighting the need for and importance of research that investigates the difference between true and false opinions/beliefs. The false opinion paradigm was also chosen because it is suitable for repeated-measure designs. This is an important feature as repeated-measures designs allow the calculation of sender detectability measures that are not influenced by individual differences in sender credibility (the tendency for some senders to be consistently judged as truthful while others tend to be consistently judged as deceptive). The calculation of these measures is discussed in more detail later in this chapter. In part two of the current study, participant-senders completed three executive control tasks - each considered to assess a certain aspect of their executive abilities

77

Chapter 5: Investigating the executive demands of deception ______

(working memory updating, inhibitory control, or set shifting). Part three of the current study sought to measure participant-sender performance in the deception task. This was accomplished by having a sample of participant-receivers evaluate the messages produced by the participant-senders in the first part of the study for veracity. The participant-receivers made dichotomous judgments regarding message veracity (truth vs. lie). These judgements were used to create two sets of sender detectability scores. The first set was calculated from the sender scores for the naïve truth and deceptive conditions (when the conditions potentially differed with respect to the demands associated with message production and those associated with impression management). The second set was calculated from the sender scores for the informed truth and deceptive conditions (when the conditions potentially differed with respect to the demands associated with message production, but not with respect to those associated with impression management). Again, the calculation of these measures is discussed in more detail later in this chapter.

Research question 1: Replications. Before investigating novel research questions, it was appropriate to assess whether the experimental procedures used in the current study produced data in line with previous lie detection studies. Specifically, previous research has consistently demonstrated that senders tend to be detected at above chance levels (see Vrij, 2008). Previous research has also consistently demonstrated that some senders tend to be perceived as truthful while others tend to be perceived as deceptive (see Levine, Park, & McCornack, 1999). If the experimental procedures used in the current study are convergently valid with those used in previous research, we would expect to replicate these findings.

Research question 2: Self-report measures. To tease apart the demands associated with impression management from those associated with message production, it is important to assess how participant-senders experiences of emotional activation, cognitive load, and behavioural control differ over the naïve truth, informed truth, and deceptive conditions. These measures will assist the interpretation of the analyses involving the different sets of detectability scores (described later in this chapter).

78

Chapter 5: Investigating the executive demands of deception ______

Research question 3: Executive control tasks. If the executive demands of deception are sufficient to influence the diagnosticity of behavioural displays, and thus receiver perceptions of credibility, we would expect sender performance on the simple executive control tasks to be related to sender performance on the deception task (poor executive performance is expected to be associated with poor deception performance). Moreover, if deception is particularly demanding of a specific executive resource (working memory updating, inhibitory control, or set shifting), we would expect sender performance on the simple executive control task that relies on the same executive resource to be more strongly related to sender performance on the deception task. Finally, if the cognitive costs associated with impression management are insufficient to influence the diagnosticity of behavioural displays, and thus receiver perceptions of credibility, we would expect the magnitude of the relationship between sender performance on the simple executive control tasks and sender performance on the deception task to be the same regardless of which set of detectability scores was used to measure sender performance on the deception task. That is, we would expect the magnitude of the relationship to be approximately the same when the detectability scores are calculated from the sender scores for the naïve truth and deceptive conditions (when the conditions potentially differed with respect to the demands associated with message production and impression management) as when they are calculated from the senders scores for the informed truth and deceptive conditions (when the conditions potentially differed with respect to the demands associated with message production, but not with respect to those associated with impression management).

Method Participant-Senders Fifty-two undergraduate psychology students from the University of New South Wales served as participant-senders. They each received partial course credit for their participation. Of the 52 participant-senders, 35 (67.31%) identified themselves as female and 17 (32.69%) as male. Their mean age was 19.81 years (range 18 - 25, SD = 2.67). Participant-senders completed part two approximately one week after they had completed part one. Both parts took approximately 1.5 hours (total) to complete.

79

Chapter 5: Investigating the executive demands of deception ______

Participant-Receivers Six-hundred and twenty-four workers from an online self-enlisted workforce (Mechanical Turk; Buhrmester, Kwang, & Gosling, 2011; Paolacci, Chandler, & Ipeirotis, 2010) served as participant-receivers. They each received US 50¢ for their participation. All of the participant-receivers indicated that they resided in the United States. Of the 624 participant- receivers, 311 (49.84%) identified themselves as female and 313 (50.16%) as male. The mean age was 30.31 (range 18-80, SD = 13.21). Of the sample, 483 (77.40%) indicated that they most strongly identified as Caucasian, 62 (9.94%) as African American, 45 (7.21%) as Hispanic and 34 (5.45%) as other ethnic or cultural backgrounds. The majority of the sample indicated that they were native English speakers (n = 605, 96.96%), with the 19 (3.04%) non-native English speakers indicating that they had been speaking English for an average of 18.41 years (range 4 - 52, SD = 7.97). Participant-receivers completed part three of the study, which took approximately 15 minutes to complete.

Materials and Procedure: Part One False opinion task. Participant-senders completed part one and two of the current study. All participant-senders were misleadingly recruited to a study investigating the cognition of public opinion. Upon arrival, they were provided with a misleading consent form informing them that the purpose of the study was to investigate how certain aspects of cognition contribute to the formation of opinions. They were told that they would be required to complete a short paper and pencil opinion survey followed by several short interviews. They were also told that the interviews would be video and audio recorded and that during the interviews they would be required to provide a verbal justification regarding some of their responses on the opinion survey. Importantly, participant-senders were told that the recordings were for transcription purposes only and that nobody except the researchers involved in the study would be working with the materials. They were told that the purpose of the transcriptions was to convert their verbal accounts to a quantifiable data format. This deception was necessary to justify the recordings while not cueing participant-senders to the fact that the veracity of their messages would be evaluated during a later stage of the experiment. After consenting to the misleading procedure, participant-senders were provided with the paper and pencil opinion survey. The survey consisted of twenty controversial

80

Chapter 5: Investigating the executive demands of deception ______socio/political opinion statements, such as “I believe that modern music negatively influences children”, “I believe that there should be a tax on high fat food”, “I believe that marijuana should be legalized”18. The complete opinion survey is presented in Appendix A. Participant-senders were asked to carefully read the survey and to indicate their opinion regarding each item using the scale provided. The scale anchors were: (1) strongly disagree, (2) disagree, (3) neutral, (4) agree, and (5) strongly agree. Once they had completed the opinion survey, the researcher entered each participant-sender’s strongly endorsed opinion items (items where the ‘strongly agree’ or ‘strongly disagree’ response was selected) into an online number randomizer. Seven strongly endorsed opinion items were then randomly selected to be the focus of the subsequent interviews. If a participant-sender did not strongly endorse seven opinion items, the endorsed opinion items (items where the ‘agree’ or ‘disagree’ response was selected) were entered into the online number randomizer. The endorsed opinion items were then selected at random until there was a total of seven opinion items (the strongly endorsed items plus the randomly selected endorsed items). The strongly endorsed and randomly selected endorsed items were then randomized again so that the strongly endorsed and endorsed items would be randomly distributed over the naïve truth, informed truth, and deceptive conditions. Once the seven opinion items were selected, participant-senders were seated in front of a video camera mounted on a tripod and told that they would now be required to provide a verbal account regarding some of their opinions. They were told that most participants speak for approximately twenty seconds and that they would only ever be asked about the statements where they had selected a non-neutral response. The researcher then sat in front of the participant-sender with the camera positioned over the researcher’s shoulder. Each interview began with the researcher turning on the camera and asking the participant-sender “do you believe that [Australia should build more nuclear power plants]”. The text in brackets changed depending on which opinion item was the focus of the interview. When the participant-sender had completed their response the camera was turned off. After completing a practice interview, participant-senders completed two recorded interviews where they were given no further instructions on how to answer.

18 The socio/political issues were similar to those used in previous research. The controversial status of each issue is investigated later in this chapter. 81

Chapter 5: Investigating the executive demands of deception ______

These first two interviews constitute the naïve truth condition. Once participant-senders had completed the first two interviews, they were debriefed and the true aims of the study revealed. They were told that the true purpose of the recordings was to assess their ability to appear credible and that each recorded interview would be edited and shown to an independent sample of participant-receivers. They were told that the participant-receivers would be asked to discern the veracity of each message by analysing the verbal and nonverbal behaviours of the interviewees. Participant-senders were also told that out of the next four interviews they would be required to provide two false opinions (opinions contrary to their original response on the opinion survey). Finally, participant-senders were told that they should attempt to appear as credible as possible during the remaining interviews, regardless of whether they were providing their true opinion or a false opinion, and that a $100 gift voucher to the University of New South Wales bookstore would be awarded to the participant-sender who was rated as most credible across the remaining four interviews. Participant-senders were then provided with an additional consent form and asked if they were willing to continue with the study. They were told that if they did not wish to continue they would be excused from the rest of the study and awarded full credit. All the participant-senders continued with the study. Once participant-senders re-consented to the study, they were provided with four cue cards (two lie and two truth cards). They were instructed to shuffle the cards and place them face down on the table behind them. At the outset of each of the remaining interviews, participant-senders were asked to turn over the top cue card and respond to the following question in the manner named on the card. This was necessary to control for experimenter expectancies and to randomize the order of the final four interviews. Participant-senders were asked to place the card in a separate pile on the table behind them so as to keep track of the veracity of each message. The remaining interviews proceeded as previously described. The false opinion task took approximately one hour to complete. Self-report measures. Immediately after each interview, participants-senders completed three computer-administered self-report measures designed to quantify their experiences of emotional activation, cognitive load, and behavioural control during the preceding interview (six repeated-measures (one for each message) on each of the three different self-report scales). The self-report measure of emotional activation asked

82

Chapter 5: Investigating the executive demands of deception ______

“relative to a typical conversation, how emotionally activated (nervous, anxious, etc.) were you during the preceding interview”. The self-report measure of behavioural control asked “relative to a typical conversation, to what extent did you try to purposefully control your expressive behaviours (where you looked, the placement of your arms, what you did with your hands, the expression on your face, etc.) during the preceding interview”. The self-report measure of cognitive load asked “relative to a typical conversation, how cognitively demanding (had to think hard, had difficulty answering the question, etc.) did you find the preceding interview”. Participant-sender responses to each self-report measure were made with a 7-point sliding scale (see Appendix B). The anchors of the scales were: (1) substantially below average, (2) moderately below average, (3) slightly below average, (4) no more than average, (5) slightly above average, (6) moderately above average, and (7) substantially above average. The sliding scales were centered at the middle point (4) and required participant-senders to move each slider before a response would be recorded. Once a participant-sender moved the slider, they were continuously presented with the sliders position to two decimal places.

Materials and Procedure: Part Two Battery of executive control tasks. After each participant-sender completed part one, they returned approximately one week later to complete three computer- administered executive control tasks. Each task was designed to measure a certain aspect of the participant-sender’s executive abilities. Specifically, an n-back task was used to measure working memory updating (Jaeggi, Seewer, Nirkko, Eckstein, Schroth, Groner, & Gutrid, 2003; Jaeggi, Studer-Luethi, Buschkuehl, Su, Jonides, & Perrig, 2010; Owen, McMillan, Laird, & Bullmore, 2005), a cue dependent go/no-go task was used to measure inhibitory control (Fillmore, 2003), and the Wisconsin Card Sorting Task was used to measure set shifting (Berg, 1954; Miyake et al., 2000). Participant- senders completed the battery of executive tasks in groups of up to four people and were given verbal instructions before each task. The order of administration was held constant across participant-senders so that the relative differences between their performances would not include practice/order/depletion effects. The battery of executive control tasks took approximately 30 minutes to complete. The details of each executive control task are described below.

83

Chapter 5: Investigating the executive demands of deception ______

N-back task. Participant-senders were shown a sequence of letter stimuli (upper case consonants) and had to push the space bar each time the letter on the current trial was identical to the letter presented n trials before. All letters were shown in white and presented centrally on a black background for 500 milliseconds each, followed by a 2000 millisecond interval. The response window lasted from the onset of each letter until the presentation of the next letter (2500 milliseconds total). Excluding start trials, stimulus blocks consisted of 15 performance trials (5 target-present trials and 10 target- absent trials). For target-present trials the current letter was the same as the letter presented n trials before and required the participant to push the space bar. Target- absent trials did not require a response. Participant-senders were tested on four levels of n (1 through 4) with each level consisting of 3 stimulus blocks. The particular level of n participant-senders worked on was pseudo-randomly determined after each block. Before undertaking the task participant-senders received four practice trials; one for each level of n. The dependant measure was the standardized proportion of correct target identifications (hits) minus the standardized proportion of incorrect target identifications (false alarms) averaged over all levels of n. Cue dependent go/no-go task (GNG). Participant-senders were presented with 250 coloured rectangles and asked to push the space bar as quickly as possible when the rectangle was green (target-present trials) and to withhold pushing the space bar when the rectangle was blue (target-absent trials). Each trial began with the presentation of a fixation cross for 800 milliseconds followed by a blank white screen for 500 milliseconds. Afterwards a black outlined (unfilled) rectangle was presented for one of five time intervals; 100, 200, 300, 400, and 500 milliseconds (evenly distributed over the target-present and target-absent trials). The outline was 7.5 X 2.5cm in length and was presented in the centre of the screen in either a horizontal or vertical position. The orientation of the rectangle signalled the likelihood of the forthcoming trial. Vertical rectangles preceded 80% of the target-present trials whereas horizontal rectangles preceded 80% of the target-absent trials. The target remained visible until a response occurred or 1000 milliseconds had elapsed. The inter-trial interval was 700 milliseconds. The task took approximately 10 minutes to complete. The dependant measure was the number of incorrect target identifications (false alarms) for target- absent trials that were preceded by a target-present cue (a vertical rectangle).

84

Chapter 5: Investigating the executive demands of deception ______

Wisconsin Card Sorting Task (WCST). Participant-senders were presented with 4 face cards towards the bottom of the screen. The face cards varied along three dimensions; colour, shape and form. On the same screen they were also presented with 1 target card towards the top of the screen. Participant-senders were instructed to match the target card to one of the four face cards. On each trial, three of the four face cards exclusively matched the target card on a single dimension, whereas one of the face cards did not match the target card on any of the dimensions. While participant-senders were not told along which dimension to match the cards, they were given feedback (correct or incorrect) after each trial. Once the participant-sender achieved 4 consecutive correct matches the matching rule was changed to a different dimension. The task continued until the participant-sender successfully completed 6 matching rules or 128 trials. The dependant measure was the number of incorrect responses that would have satisfied the previous matching rule divided by the total number of completed trials for each participant (percent perseverative errors).

Materials and Procedure: Part Three Online veracity judgement task. Participant-receivers completed part three of the current study. They were first presented with an online consent form and instructions explaining that they were about to view ten recorded interviews of people stating various opinions. They were told that while some of these people were stating their true opinions, others were stating false opinions. They were asked to evaluate each interviewee’s verbal and nonverbal behaviours and to indicate whether they thought each interviewee was being truthful or deceptive. After reading the instructions and providing their consent, participant-receivers were presented with an audio validation screen. This screen consisted of a blank text entry field and an audio recording explaining that in order to proceed they must enter the validation word ‘CAT’ into the text entry field. This ensured that all participant- receivers had functioning audio systems and were able to hear the participant-sender messages. Once they entered the correct validation word, participant-receivers were pseudo-randomly presented with ten participant-sender messages. The pseudo-random assignment ensured that no participant-sender would appear more than once per sequence, thus eliminating the possibility of participant-senders contradicting

85

Chapter 5: Investigating the executive demands of deception ______themselves across messages and the participant-receivers using this information to inform their veracity judgements. Each trial began with the presentation of a message, during which time all the participant-receivers controls were locked. Once each message had finished playing, the task automatically proceeded to the response screen where the participant-receivers could indicate whether they thought the preceding message was truthful or deceptive via a forced choice response (truth vs. lie). After each participant-receiver had completed all ten trials, a ‘catch trial’ was presented. This consisted of a video-audio recording of the researcher stating that “in order to ensure data integrity, please select the lie response on the next screen”. This trial served to identify participant-receivers who were not paying close attention or randomly responding (Paolacci, Chandler, & Ipeirotis, 2010). No participant-receivers failed the catch trial. The online veracity judgment task took approximately 15 minutes to complete. Online opinion survey. After completing the online veracity judgement task, all participant- receivers were presented with a similar opinion survey to that used in part one (see Appendix C). The opinion survey consisted of the same 20 controversial socio/political opinion statements used in part one (the false opinion task) as well as an additional ten opinion items. The extended opinion survey was included to investigate whether the opinion items themselves may have influenced the accuracy of some of the veracity judgements. Opinion items where the majority of responses lay in one direction may be problematic because participant-receivers may have used this information to inform their veracity judgments. For example, a participant-sender who deceptively claimed to disagree with the statement “marijuana should be legalized” may have received more correct identifications (receiver judgements of deception) than an equally skilled deceiver who deceptively claimed to agree with the same statement simply because the former position (disagreement) was perceived as more unlikely than the latter position (agreement). Importantly, rather than responding based on their own opinions, participant-receivers were asked to choose the response they thought best reflected the majority of Australian University student’s opinions. This is because the participant-receiver judgements would be influenced by their beliefs about the likelihood of certain positions (agreement vs. disagreement) among the group of participant-senders (Australian University students) and not their own opinion/beliefs (although the two are likely to be correlated).

86

Chapter 5: Investigating the executive demands of deception ______

Results Operationalizing Sender Performance: A Primer Regarding Stimulus Models of Signal Detection Theory In order to appropriately characterise participant-sender performance in the false opinion task, measures of sender detectability and credibility were calculated using signal detection theory parameters (SDT; Green & Swets, 1966). While SDT has traditionally been used to characterize receiver performance in veracity judgement tasks, in this thesis it is used to characterize sender performance. A brief overview of this approach is described below. In part three of the current study, participant-receivers were each presented with ten messages and asked to discern the veracity of each message (truth vs. lie). Such tasks are notoriously difficult because while the liars and truth-tellers usually have some small systematic differences with regard to their verbal and nonverbal behaviours (there is typically a small amount of between-group variability), these relatively small differences occur in the context of large individual differences between senders (there is typically a large amount of within-group variability). For instance, while liars usually take longer to initiate responses compared to truth-tellers, the simple fact that a particular sender takes a long time to initiate a response does not necessarily mean that they are lying. This is because some people just take longer to initiate responses than others, irrespective of whether they are lying or telling the truth. In SDT, any systematic differences that occur across groups of liars and truth-tellers represent a ‘to be detected’ signal. Any individual differences between senders, on the other hand, represent noise. The difficulty for receivers is that the noise-to-signal ratio is typically high (Levine, 2010). According to SDT, receivers base their responses (truth vs. lie) on the value of a decision variable. The value of this decision variable represents the receiver’s belief as to whether the particular sender in question is lying. On any given trial, receivers compare the value of this decision variable to some critical value. When the value of the decision variable is greater than this critical value (known as the receiver’s criterion), the receiver is said to be sufficiently confident that the sender in question is lying (that the signal is present) and will therefore select the ‘lie’ response; otherwise they will select the ‘truth’ response. There are four possible outcomes for each of the receiver’s decisions. The receiver may either: (1) correctly identify a deceptive message (a hit;

87

Chapter 5: Investigating the executive demands of deception ______state that the signal is present when it is present), (2) incorrectly identify a deceptive message (a miss; state that the signal is absent when it is present), (3) correctly identify a truthful message (a correct rejection; state that the signal is absent when it is absent), or (4) incorrectly identify a truthful message (a false alarm; state that the signal is present when it is absent). Importantly, each decision is associated with a particular receiver (whoever made the judgement) as well as a particular sender (whomever the judgement applied to). The manner with which the outcomes are grouped (receiver vs. sender) determines whether they can be used to describe receiver or sender performance. In receiver SDT models, each hit that contributes to a receiver’s hit rate and each false alarm that contributes to their false alarm rate comes from a different sender/message. That is, in receiver SDT models, different senders/messages represent different trials. When this is the case, receiver hit rates reflect the proportion of the deceptive messages that were correctly identified by any given receiver. Specifically, in part three of the current study, each receiver judged ten senders/messages. If five of these messages were deceptive (for example), and a particular receiver correctly identified three of these messages (stated that they were indeed deceptive), the receiver’s hit rate would be 0.6 (3/5 = 0.6). Receiver false alarm rates, on the other hand, reflect the proportion of the truthful messages that were incorrectly identified by any given receiver. For instance, if five of the messages were truthful, and a particular receiver incorrectly identified four of these messages (stated that were deceptive), the receiver’s false alarm rate would be 0.8 (4/5 = 0.8). When different senders/messages represent different trials (receiver SDT models), hit and false alarm rates can be used to index receiver performance. Importantly, in receiver SDT models, the value of the decision variable for any given trial is influenced by the properties of the particular sender/message featured in the trial. For example, when the message is deceptive, the value of the decision variable will depend on the particular sender’s detectability (the intensity of the signal). Bad liars will induce higher decision variable values (be perceived as more deceptive) than good liars. The value of the decision variable will also depend on the particular sender’s credibility (their tendency to be believed). Less credible senders will induce higher decision variable values (be perceived as more deceptive) than more credible senders, irrespective of whether they are telling the truth or lying. As a result, the decision

88

Chapter 5: Investigating the executive demands of deception ______variable takes a range of different values across both the truthful (noise only) and deceptive (noise plus signal) trial types. Theoretically, within each group of trials, the distribution of the decision variable values would be normally distributed. As such, the false alarm rate reflects the proportion of truthful trials that induced decision variable values greater than the receiver’s criterion (illustrated by the green shaded area in Figure 1) while the hit rate reflects the proportion of deceptive trials that induced decision variable values greater than the receiver’s criterion (illustrated by the red shaded area in Figure 1). The values for the hit and false alarm rates can be used to estimate a receiver’s detection ability and credulity.

Figure 1. Distribution of decision variable values across truthful (noise only) and deceptive (noise plus signal) trials. The red shaded area reflects the proportion of the deceptive trials that induced decision variable values greater than the receiver’s criterion (indexed by the hit rate) while the green shaded area reflects the proportion of the truthful trials that induced decision variable values greater than the receiver’s

criterion (indexed by the false alarm rate). d' indexes the degree of overlap between the two distributions (sensitivity) while C indexes the location of the criterion relative to the neutral point (the point where the two distributions cross over).

When different senders/messages represent different trials (receiver SDT models), differences between one receiver’s hit and false alarm rates and another receiver’s hit and false alarm rates are caused by differences in receiver properties; namely differences in receiver credulity (the receiver’s general tendency to believe messages) and differences in receiver detection ability (the receiver’s sensitivity to signals). The problem is that most performance measures confound receiver credulity

89

Chapter 5: Investigating the executive demands of deception ______and detection ability (Stanislaw & Todorov, 1999)19. This means that if one receiver obtains a higher correct identification rate (for example) than another receiver, it is unclear whether this difference was caused by the first receiver having greater detection abilities, more/less credulity, or both. SDT parameters do not suffer the same limitation. Theoretically, receiver detection ability is related to the degree to which the two distributions of decision variable values overlap (see Figure 1). Less overlap reflects greater detection abilities. The degree of overlap is related to the distance between the peaks of the two distributions and the standard deviations of the distributions. This means that detection ability can be quantified by determining the distance between the two distribution means relative to their standard deviations. This can be formally expressed as:

d' = Z(hit rate)–Z(false alarm rate)

d' is centered at 0 (the receiver did not distinguish between truthful and deceptive trials), with larger positive values indicating better detection ability (to some extent the receiver correctly distinguished between truthful and deceptive trials) and larger negative values indicating worse detection ability (to some extent the receiver incorrectly distinguished between truthful and deceptive trials). Importantly, d' is unaffected by changes in receiver credulity. That is, d' provides an unbiased measure of detection ability. As previously stated, receiver credulity is related to the location of the criterion (the point at which the receiver’s belief becomes sufficient to warrant a ‘lie’ response). The criterion’s location can be quantified relative to the position of the neutral point where neither response (truth vs. lie) is favoured (the point where the two distributions cross over). This can be formally expressed as:

C = –[Z(hit rate)+Z(false alarm rate)]/2

Similar to d', C is also centered at 0 (the neutral point where neither response was favoured by the receiver), with larger positive values indicating more credulity (the receiver had a tendency to believe messages) and larger negative values indicating less credulity (the receiver had a tendency to disbelieve messages).

19 According to Stanislaw and Todorov (1999), this includes “the hit rate, the false alarm rate, the hit rate “corrected” by subtracting the false alarm rate, and the proportion of correct responses in a yes/no task” (p. 139). 90

Chapter 5: Investigating the executive demands of deception ______

When outcomes are organized such that different senders/messages represent different trials (receiver SDT models), hit and false alarm rates are influenced by the properties of receivers. Importantly, outcomes can also be organized such that different receivers represent different trials. That is, the outcomes are grouped according to whomever the judgements were applied to (senders) rather than whoever made the judgements (receivers). In these (sender) SDT models, each hit that contributes to a sender’s hit rate and each false alarm that contributes to their false alarm rate comes from a different receiver. When this is the case, hit rates reflect the proportion of the receiver judgements that relate to a given sender’s deceptive messages(s) that were correct. Specifically, in the current study, each sender’s message was judged by 20 receivers. As each sender produced two deceptive messages, sender hit rates reflect the proportion of these 40 receiver judgements that were correct. For example, if 24 of the receiver judgements that relate to a given sender’s deceptive messages were correct, the sender’s hit rate would be 0.6 (24/40 = 0.6). False alarm rates, on the other hand, reflect the proportion of the receiver judgements that relate to a given sender’s truthful message(s) that were incorrect. For example, if 32 of the receiver judgements that relate to a given sender’s truthful messages were incorrect, the sender’s false alarm rate would be 0.8 (32/40 = 0.8). When different receivers represent different trials (sender SDT models), hit and false alarm rates can be used to index sender performance. In sender SDT models, the decision variable values that underlie each receiver’s decision are grouped together, thus the decision variable for a given sender takes a range of different values across the truthful (receiver judgements of a particular sender’s truthful messages) and deceptive (receiver judgements of a particular sender’s deceptive messages) trial types. To clarify, the trial types in a receiver SDT model concern whether the messages are truthful or deceptive while the trial types in sender SDT models concern whether the judgements apply to truthful or deceptive messages. Furthermore, in receiver SDT models the trials within each type of trial consist of different senders/messages (truthful vs. deceptive) while in sender SDT models the trials within each type of trial consist of different receiver judgements of either truthful or deceptive messages. Whereas in receiver SDT models the degree of overlap between the two distributions is related to a receiver’s detection ability (the receiver’s sensitivity to signals), in sender SDT models it is related to a sender’s detectability (their ability to deceive/the intensity of their signal). d' is still centered at 0 (the judgements of the

91

Chapter 5: Investigating the executive demands of deception ______deceptive messages were indistinguishable from the judgements of the truthful messages), with larger positive values indicating greater detectability (the judgements of the deceptive messages were correctly distinguished from the judgements of the truthful messages) and larger negative values indicating less detectability (the judgements of the deceptive messages were incorrectly distinguished from the judgements of the truthful messages). C is also still centered at 0 (the neutral point where neither response was favoured), with larger positive values indicating more credibility (the sender had a tendency to be believed) and larger negative values indicating less credibility (the receiver had a tendency to be disbelieved). Importantly, in sender SDT models, d' is still unaffected by changes in C. That is, d' and C provide unbiased measures of sender detectability and credibility, respectively. As this thesis investigates factors that may account for individual differences in sender performance, sender detectability scores are used to index a sender’s ability to deceive while sender credibility scores are used to index a sender’s tendency to be judged as truthful. That is, this thesis uses sender SDT models. An advantage of the current study is that it allows for the calculation of two measures of sender detectability. One of these measures (informed detectability scores) is calculated using the sender hit rates for the deceptive condition (the proportion of the 40 receiver judgements that relate to a given sender’s deceptive messages that were correct) and the false alarm rates for the informed truth condition (the proportion of the 40 receiver judgements that relate to a given sender’s informed truthful messages that were incorrect). Ideally, these informed detectability scores will include only the effect of message production (see Chapter 2). They may not include the effect of impression management because both the informed truth and deceptive conditions were given the same objective (try to appear as credible as possible). The other measure of sender detectability (naïve detectability scores) is calculated using the sender hit rates for the deceptive condition and the false alarm rates for the naïve truth condition (the proportion of the 40 receiver judgements that relate to a given sender’s naive truthful messages that were incorrect). Ideally, these naive detectability scores will include the effect of message production as well as the effect of impression management. This is because the naïve truth and deceptive conditions did not have the same objective (only the deceptive condition received the credibility induction).

92

Chapter 5: Investigating the executive demands of deception ______

Research Question 1: Replications To assess whether the experimental procedures used in the current study replicated some of the findings reported in previous research (that senders tend to be somewhat detectable), the naïve and informed detectability scores were compared against hypothetical null distributions via two one-sample t-tests. As each comparison pertained to the same research question, the significance of each comparison was evaluated against a Šidák adjusted alpha level of .025 (Kirk, 1995; Šidák, 1967; Sokal & Rohlf, 1995; Quinn & Keough, 2002)20. The tests indicated that the mean naive detectability score (where sender performance reflects differences between the naïve truth and deceptive conditions; M = 0.23, SD = 0.55, 95% CI [0.08, 0.39]) was significantly greater than that expected by chance (a mean sender detectability score of 0), t(51) = 3.08, p < .005, d = 0.43, 95% CI [0.14, 0.71]. This result suggests that senders tend to be perceived as more credible when they are telling the truth and taking their credibility for granted than when they are lying. The tests also indicated that the mean informed detectability score (where sender performance reflects differences between the informed truth and deceptive conditions; M = 0.17, SD = 0.50, 95% CI [0.03, 0.31]) was significantly greater than that expected by chance, t(51) = 2.44, p = .02, d = 0.34, 95% CI [0.06, 0.62]. This result suggests that senders tend to be perceived as more credible when they are telling the truth and not taking their credibility for granted than when they are lying. In addition to assessing whether the experimental procedures used in this study produced somewhat detectable senders, the research question also pertained to sender credibility scores. To assess whether the experimental procedures used in the current study produced a veracity effect the naïve and informed credibility scores were also compared against hypothetical null distributions via two one-sample t-tests. The tests indicated that the mean naïve credibility score (M = 0.19, SD = 0.34, 95% CI [0.10, 0.29]) was significantly greater than that expected if there was no veracity effect, t(51) = 4.10, p < .005, d = 0.49, 95% CI [0.27, 0.86]. The tests also indicated that the mean informed credibility score (M = 0.15, SD = 0.30, 95% CI [0.06, 0.23]) was significantly greater than that expected if there was no veracity effect, t(51) = 3.52, p < .005, d =

20 1⁄푘 The pairwise alpha level was adjusted using the correction 훼푝푐 = 1 − (1 − 훼푓푤) , where 훼푝푐 represents the pairwise alpha level, 훼푓푤 represents the familywise alpha level, and 푘 represents the number of comparisons (Šidák, 1967). The correction was applied using a familywise alpha level of .05. 93

Chapter 5: Investigating the executive demands of deception ______

0.49, 95% CI [0.20, 0.77]. Together, these results suggest that the experimental procedures used in the current produced a veracity effect (senders tend to be believed).

Research Question 2: Self-Report Measures To assess whether the mean levels of self-reported emotional activation, cognitive load, and/or behavioural control varied across the three experimental conditions, three repeated measures ANOVAs were conducted. Each ANOVA contained a within-subjects factor for condition (naïve truth vs. informed truth vs. deceptive) and one of the three self-report measures as the dependant variable. As each analysis tested different specific hypotheses, the familywise error rate for each set of comparisons was set to .05. Within each analysis, however, the comparisons were related, therefore each of the three planned contrasts (naïve truth vs. informed truth; naïve truth vs. deceptive; informed truth vs. deceptive) were evaluated against a Šidák adjusted alpha level of .017. Following the recommendations proposed by Steiger (2004), 90% confidence intervals were placed around one-sided estimates whereas 95% confidence intervals were placed around two-sided estimates. Furthermore, following the recommendations proposed by Ruxton and Beauchamp (2008), only the results from the planned comparisons will be reported and interpreted. Emotional activation. The mean self-reported emotional activation scores and their respective 95% confidence intervals are presented in Figure 2. The contrast tests revealed that the mean difference between the self-report ratings of emotional activation for the naive truth and informed truth conditions was not statistically significant, F(1, 2 51) = 1.05, MSE = 2.21, p = .31, η p = .02, 90% CI [.00, .12], with the mean self-report rating for the naïve truth condition only 0.21 units higher (95% CI [-0.20, 0.63]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing approximately the same amount of emotional activation in the naïve truth condition as they did in the informed truth condition. By contrast, the mean difference between the self-report ratings of emotional activation for the naïve truth and deceptive conditions was statistically significant, F(1, 2 51) = 8.26, MSE = 2.85, p = .01, η p = .14, 90% CI [.02, .28], with the mean self-report rating for the naive truth condition 0.67 units lower (95% CI [0.20, 1.14]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing higher levels of emotional activation in the

94

Chapter 5: Investigating the executive demands of deception ______deceptive condition than in the naïve truth condition. Likewise, the mean difference between the self-report ratings of emotional activation for the informed truth and deceptive conditions was also statistically significant, F(1, 51) = 15.59, MSE = 2.61, p < 2 .005, η p = .23, 90% CI [.08, .38], with the mean self-report rating for the informed truth condition 0.89 units lower (95% CI [0.44,1.34]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing higher levels of emotional activation in the deceptive condition than in the informed truth condition.

7

s

e

r o

c 6

S

n o

i 5

t

a

v i

t 4 N a iv e T ru th C o n d itio n

c A

In fo rm e d T ru th C o n d itio n l

a 3

n D e c e p tiv e C o n d itio n

o i

t 2

o m

E 1 E m o tio na l A c tiva tio n

F ig u r e 2 . M e a n e m o tio n a l a c tiv a tio n s c o re s b y c o n d itio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

Cognitive load. The mean self-reported cognitive load scores and their respective 95% confidence intervals are presented in Figure 3. The contrasts tests revealed that the mean difference between the self-report ratings of cognitive load for the naive truth and informed truth conditions was statistically significant, F(1, 51) = 2 6.28, MSE = 3.75, p = .01, η p = .11, 90% CI [.01, .25], with the mean self-report rating for the naïve truth condition 0.67 units lower (95% CI [0.13, 1.21] than the mean self- report rating for the informed truth condition. This means that on average the participant-senders reported experiencing higher levels of cognitive load in the informed truth condition than in the naïve truth condition. Likewise, the mean difference between the self-report ratings of cognitive load for the naïve truth and deceptive conditions was 2 also statistically significant, F(1, 51) = 31.35, MSE = 4.43, p < .005, η p = .38, 90% CI [.20, .51], with the mean self-report rating for the naive truth condition 1.64 units lower (95% CI [1.05, 2.22]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing higher levels of

95

Chapter 5: Investigating the executive demands of deception ______cognitive load in the deceptive condition than in the naïve truth condition. Finally, the mean difference between the self-repot ratings of cognitive load for the informed truth and deceptive conditions was also statistically significant, F(1, 51) = 9.97, MSE = 4.82, 2 p < .005, η p = .16, 90% CI [.04, .31], with the mean self-report rating for the informed truth condition 0.96 units lower (95% CI [0.35, 1.57]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing higher levels of cognitive load in the deceptive condition than in the informed truth condition.

7 s

e 6

r

o

c S

5

d a o N a iv e T ru th C o n d itio n

L 4

e

v In fo rm e d T ru th C o n d itio n i

t 3 i

n D e c e p tiv e C o n d itio n g

o 2 C

1 C o gnitive L o a d

F ig u r e 3 . M e a n c o g n itv e lo a d s c o re s b y c o n d itio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

Behavioural control. The mean self-reported behavioural control scores and their respective 95% confidence intervals are presented in Figure 4. The contrast tests revealed that the mean difference between the self-report ratings of behavioural control for the naive truth and informed truth conditions was statistically significant, F(1, 51) = 2 92.98, MSE = 1.99, p < .005, η p = .65, 90% CI [.50, .73], with the mean self-report rating for the naïve truth condition 1.89 units lower (95% CI [1.49, 2.28]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing higher levels of behavioural control in the informed truth condition than in the naïve truth condition. Likewise, the mean difference between the self-report ratings of behavioural control for the naïve truth and deceptive conditions was also statistically significant, F(1, 51) = 82.38, MSE = 2.11, p < 2 .005, η p = .62, 90% CI [.47, .70], with the mean self-report rating for the naive truth condition 1.83 units lower (95% CI [1.43, 2.22]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported

96

Chapter 5: Investigating the executive demands of deception ______experiencing higher levels of behavioural control in the deceptive condition than in the naïve truth condition. By contrast, the mean difference between the self-report ratings of behavioural control for the informed truth and deceptive conditions was not statistically 2 significant, F(1, 51) = 0.23, MSE = 0.76, p = .64, η p = .00, 90% CI [.00, .08], with the mean self-report rating for the informed truth condition only 0.06 units higher (95% CI [-0.19, 0.30]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing the same amount of behavioural control in the informed truth condition as they did in the deceptive condition.

7

s

e r

o 6

c

S

l

o 5

r

t n

o 4 N a iv e T ru th C o n d itio n

C

l

a In fo rm e d T ru th C o n d itio n r

u 3 o

i D e c e p tiv e C o n d itio n v

a 2

h

e B 1 B e ha vio ura l C o ntro l

F ig u r e 4 . M e a n b e h a v io u ra l c o n tro l s c o re s b y c o n d itio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

Research Question 3: Executive Control Tasks To assess whether the outcomes from the executive control tasks predicted any of the variance in the sender detectability scores and whether the magnitude of the predictive relationship changed depending on which type of truthful message (naïve vs. informed truth) was used to operationalize sender detectability, two multiple regression models were fit to the data. The first model regressed the naïve detectability scores (where the false alarm rates for the naïve truth condition were used to operationalize sender detectability) onto the outcomes from the executive control tasks whereas the second model regressed the informed detectability scores (where the false alarm rates for the informed truth condition were used to operationalize sender detectability) onto the outcomes from the executive control tasks. To assist with model interpretation, the directionality of the measures was made consistent such that larger scores always

97

Chapter 5: Investigating the executive demands of deception ______represented worse performance21. Before interpreting the regression models, however, preliminary analyses were conducted to assess the reliability of the outcomes from the false opinion task, the dispersion of the outcomes from the executive control tasks, and the assumptions underlying the regression procedure. To assess the intra-individual consistency of the outcomes from the false opinion task, the inter-correlations between the by-sender aggregated truth scores (the number of truthful responses divided by the number of total responses) for the six messages were examined. Table 1 displays the mean truth scores and their standard deviations, as well as the inter-correlations between the six messages. The pattern of inter-correlations indicates a modest degree of intra-sender consistency such that the participant-senders who were judged as more truthful in any one of their messages tended to be judged as more truthful in their other messages as well.

Table 3 Inter-Correlations Between Messages Message 1. 2. 3. 4. 5. 1. Naive truth 1 ------2. Naive truth 2 .55** ------3. Informed truth 1 .34** .38** ------4. Informed truth 2 .32** .39** .39** -- -- 5. False opinion 1 .31** .34** .37** .42** -- 6. False opinion 2 .36** .35** .35** .38** .48** Note. * p < .01, two-tailed. ** p < .05, two-tailed.

To assess whether the executive control tasks appropriately distributed individual differences across the outcome measures, descriptive statistics, histograms, and P-P plots were examined. While the distributions for the WCST and the n-back task appeared approximately normal in shape, the distribution for the GNG task appeared to contain a modest amount positive skew. The positive skew in the GNG task outcomes was caused by a ceiling effect, with 12 participant-senders (23.10%) having perfect performance on the task (no errors). As the regression model’s residuals appeared to be approximately normal in shape, however, the influence of the positive skew was considered to be minimal. The descriptive statistics for the executive control tasks are presented in Table 2.

21 The directionality of the N-back scores was reversed by subtracting each participant-sender’s score from the maximum possible score. The other measures were already scaled such that higher scores represented worse performance. 98

Chapter 5: Investigating the executive demands of deception ______

Table 4 Descriptive Statistics for the Executive Control Tasks Task Mean SD Range Skewness Kurtosis WCST 0.17 0.08 0.01 - 0.38 0.01 -0.17 N-back 2.07 0.67 0.37 - 3.65 -0.14 0.44 GNG 2.92 2.49 0 - 9 0.90 -0.70

To assess the assumptions underlying the regression procedures, several analyses were conducted. Normality of the regression residuals was assessed by examining histograms and P-P plots. Homoscedasticity was assessed by examining plots of the standardized residuals as a function of the standardized predicted values. The Independence of errors was met by the design of the study. Linearity/co-linearity was assessed by examining the zero-order Pearson correlation coefficients between the executive control tasks and each set of detectability scores and their respective scatterplots. The zero-order correlations are presented in Table 3.

Table 5 Zero-Order Correlations Between the Executive Control Tasks and Detectability Scores Task WCST N-back GNG WCST ------N-back .00 -- -- GNG -.09 .19 -- Naive Detectability Scores .24† .19 -.07 † Informed Detectability Scores .10 .24 .07 Note. † p < 0.1, two-tailed.

An examination of Table 3 indicates that while the three executive control tasks had little to no associations with one another, the n-back task had small positive associations with the naïve and informed detectability scores (poor working memory updating was associated with greater sender detectability). The WCST also showed a small positive association with the naïve detectability scores (poor set shifting was associated with greater naïve detectability scores). These relationships, however, were not statistically significant at the .05 level. While these results suggest that the assumption of linearity may have been violated, eliminating theoretically important variables based on a non-significant bivariate relationship may be premature (Pandey & Elliott, 2010). Indeed, Bertrand and Holder (1988) argued that “simultaneous regression on all relevant variables is essential to obtain a correct understanding of the underlying process” (p. 371). The sender detectability scores were therefore regressed onto the 99

Chapter 5: Investigating the executive demands of deception ______three outcomes from the executive control tasks. The results of the regression analyses are presented in Table 4.

Table 6 Summary of Multiple Regression Analyses 95% CI for β Variable B SE B β LL UL t p Naïve Detectability Scores a WCST 1.57 0.92 0.23 -.05 .47 1.70 .10 N-back 0.17 0.11 0.20 -.07 .45 1.45 .15 GNG 0.02 0.03 0.09 -.19 .35 0.60 .55 Informed Detectability Scores b WCST 0.62 0.87 0.10 -.18 .36 0.71 .48 N-back 0.18 0.11 0.24 -.04 .48 1.69 .10 GNG 0.01 0.03 0.03 -.24 .30 0.19 .85 Note. a R2 = .10, p = .17. b R2 = .07, p = .32

An examination of Table 4 indicates that while the outcomes from the executive control tasks predicted around 10% of the variance in the naïve detectability scores, the regression model was not statistically significant, nor were any of the individual path coefficients. Similarly, while the outcomes from the executive control tasks predicted around 7% of the variance in the informed detectability scores, neither the model as a whole or any of the individual path coefficients were statistically significant.

Post Hoc Analysis of Opinion Items To investigate whether the opinion items themselves may have influenced the veracity judgements, the data from the online opinion survey completed by the participant-receivers in part three was used to bootstrap 95% confidence intervals for the average participant-receiver response to each item on the survey. If the lower bound of an items associated confidence interval was less than 2.5 or the upper bound was greater than 3.5, it was possible that the correct identification rate for the messages about the item may have been biased by the participant-receivers perceptions regarding the likelihood that a participant-sender would hold a position in agreement or disagreement. The mean participant-receiver responses and their respective 95% confidence intervals are presented in Table 5. The results of the analysis indicated that 5 of the 20 opinion items used in part one (the false opinion task) may have biased the accuracy of some of the veracity judgements. These opinion items were: (2) “fewer boatpeople should be accepted into Australia as refugees”, (3) “the rich should pay more 100

Chapter 5: Investigating the executive demands of deception ______taxes”, (11) “marijuana should be legalized”, (13) “video games are bad for children”, and (15) “human genetic engineering is unethical”. For opinion items 2, 3, and 15, participant-receivers on average perceived a position in agreement to be more likely than a position in disagreement. For opinion items 11 and 13, on the other hand, participant-receivers on average perceived a position in disagreement to be more likely than a position in agreement. A further three opinion items were found to be potentially problematic (opinion items 24, 27, and 29), however these items were not included in the survey that was completed by the participant-senders in part one (the false opinion task). This means that these items could not have biased the results of the current study, but that they may be problematic if used in future research.

Table 7 Means and 95% Confidence Intervals for Opinion Items 95% CI 95% CI Opinion Item Mean LL UL Opinion item Mean LL UL 1 2.97 2.89 3.05 16 3.04 2.96 3.11 2* 3.97 3.88 4.05 17 3.02 2.95 3.10 3* 4.48 4.3 4.56 18 3.40 3.33 3.48 4 2.77 2.70 2.85 19 3.05 2.97 3.13 5 2.98 2.91 3.07 20 3.00 2.93 3.08 6 3.06 2.97 3.16 21 2.65 2.58 2.72 7 2.95 2.87 3.03 22 3.01 2.93 3.08 8 3.05 2.97 3.13 23 3.04 2.97 3.13 9 3.35 3.26 3.43 24* 2.35 2.28 2.42 10 2.91 2.83 3.00 25 3.42 3.35 3.49 11* 1.56 1.48 1.64 26 3.34 3.28 3.41 12 3.00 2.91 3.07 27* 2.43 2.36 2.50 13* 2.06 1.98 2.13 28 2.79 2.72 2.86 14 2.98 2.90 3.07 29* 1.96 1.88 2.04 15* 3.43 3.34 3.51 30 2.96 2.87 3.03 Note. * indicates opinion items that had a confidence interval with a lower bound less than 2.5 or an upper bound greater than 3.5.

Discussion Research Question 1: Replications Before investigating novel research questions, it was considered appropriate to assess whether the experimental procedures used in the current study produced data in line with previous lie detection studies. The results demonstrated that on average sender detectability was significantly greater than chance performance (participant-senders were somewhat detectable). The results also demonstrated that on average sender credibility was significantly greater than that expected if there was no veracity effect 101

Chapter 5: Investigating the executive demands of deception ______

(participant-senders tended to be believed). This suggests that the experimental procedures used in the current study have convergent validity with those used in previous lie detection research.

Research Question 2: Self-Report Measures The results from the analyses of the self-report measure of behavioural control indicated that cueing participant-senders to their credibility had the intended effect (increased levels of behavioural control), with the mean level of self-reported behavioural control for the informed truth condition significantly higher than the mean level of self-reported behavioural control for the naïve truth condition. The means and standard deviations for the self-reported behavioural control scores indicated that when participant-senders were not cued to their credibility (in the naïve truth condition), the majority of the participant-senders reported engaging in ‘no more’ to ‘slightly more’ behavioural control than they would in a typical conversation. When the participant- senders were cued to their credibility (in the informed truth condition), however, they reported engaging in ‘moderately more’ to ‘substantially more’ behavioural control than they would in a typical conversation. This increase accounted for around 65% of the variability in the self-report ratings of behavioural control for the naïve truth and informed truth conditions, indicating that when the participant-senders were misled about the true purpose of the recordings, most took their credibility for granted and engaged in what they regarded as near normal levels of behavioural control. When they were told that their credibility was at stake and were instructed to try to appear as credible as possible, however, the majority of the participant-senders reported engaging in what they considered to be near maximum levels of behavioural control. While the self-report behavioural control scores for the naïve truth and informed truth conditions were considerably different (the effect size was very large), participant- senders reported that they engaged in approximately the same amount of behavioural control in the deceptive condition as they did in the informed truth condition. This result is consistent with the idea that simply cueing participants to their credibility is sufficient to induce near maximum levels of behavioural control, with the lack of a significant difference between the informed truth and deceptive conditions reflecting a ceiling effect in the degree to which participant-senders can actually engage in behavioural control. Alternatively, the lack of a significant difference between these conditions may

102

Chapter 5: Investigating the executive demands of deception ______also reflect a ceiling in the self-report scale itself. Such a ceiling may have been introduced by the relative framing of the self-report question (relative to a typical conversation) and the fact that the top anchor of the self-report scale was “substantially above average”. While it may have been the case that the participant-senders experienced substantially more behavioural control in both the deceptive and informed truth conditions than they would in a typical conversation, this finding does not necessarily mean that they experienced the same amount of behavioural control across the deceptive and informed truth conditions. A different result may have been observed if the participant-senders rated their experiences of behavioural control on an absolute scale. That is, any score above 7 (for example) on a scale that ranges from 0 (no behavioural control) to 10 (maximum behavioural control) may be regarded as substantially more behavioural control than a typical conversation, yet a score of 10 would still reflect more behavioural control than a score of 7. While the data from the current study cannot rule out this possibility, it is largely consistent with the idea that when truth-tellers are aware that their credibility is at stake and are concerned with creating a credible impression, they engage in the same amount of behavioural control that liars engage in. In addition to increasing the amount of behavioural control participant-senders reported experiencing, cueing participant-senders to their credibility also increased the amount of cognitive load they reported experiencing. The means and standard deviations for the self-report cognitive load scores indicated that when the participant- senders were not cued to their credibility (in the naïve truth condition), the majority reported experiencing ‘no more’ cognitive load than they would in a typical conversation. When they were cued to their credibility (in the informed truth condition), however, the majority reported experiencing ‘slightly more’ cognitive load than they would in a typical conversation. These results are consistent with the idea that purposefully controlling one’s expressive behaviours is cognitively demanding. The additional task of producing a false opinion also appeared to increase the amount of cognitive load participant-senders reported experiencing above and beyond that associated with simply controlling their expressive behaviours, with the means and standard deviations for the self-report cognitive load scores for the deceptive condition indicating that the majority of the participant-senders reported experiencing ‘moderately more’ cognitive load than they would in a typical conversation. Importantly, while

103

Chapter 5: Investigating the executive demands of deception ______simply cueing participant-senders to their credibility explained around 11% of the variability in the self-report cognitive load scores for the naïve truth and informed truth conditions, the additional task of producing a false opinion explained around 16% of the of the variability in the self-reported cognitive load scores for the informed truth and deceptive conditions. These results suggest that while purposefully controlling one’s expressive behaviours is cognitively demanding, the task of producing a false opinion not only increases perceptions of cognitive load further, but also to a greater degree than simply controlling one’s behaviours. The finding that simply cueing participants to their credibility increased the amount of cognitive load they experienced cannot be explained by a corresponding increase in emotional activation. Indeed, when participant-senders were cued to their credibility they reported experiencing approximately the same amount of emotional activation than when they were not cued to their credibility, with the mean and standard deviations for the self-report emotional activation scores indicating that the majority of the participant-senders reported experiencing ‘slightly more’ emotional activation than they would in a typical conversation, regardless of whether or not they were cued to their credibility. When participant-senders had to produce a false opinion, however, they reported experiencing more emotional activation than when they had to tell the truth, with the mean and standard deviation scores for the self-report emotional activation scores indicating that the majority of participant-senders experienced ‘moderately more’ emotional activation than they would in a typical conversation in the deceptive condition. This heightened level of emotional activation may have contributed to the increased cognitive load participant-senders reported in the deceptive condition.

Research Question 3: Executive Control Tasks At first glance, there appears to be little evidence of a relationship between executive abilities and sender detectability scores, with no statistically significant findings observed in the multiple regression analyses. A closer inspection of the 95% confidence intervals, however, suggests that the results were more likely to be observed if the alternate hypothesis was true (there were indeed small relationships present in the population) than if the null hypothesis was true. Specifically, when the naive detectability scores were regressed onto the outcomes from the executive control tasks, poor performances on the WCST and the n-back task were associated with slightly

104

Chapter 5: Investigating the executive demands of deception ______higher naïve detectability scores. The point estimates for these relationships indicated that the effects were small to intermediate in size, with the lower bounds of their respective 95% confidence intervals only marginally below zero. Furthermore, poor performances on the n-back task were also associated with slightly higher informed detectability scores, with the point estimate for the effect approximately the same size as that observed when the naïve detectability scores were regressed onto the outcomes from the executive control tasks. The WCST, on the other hand, only appeared to be associated with the naïve detectability scores and not the informed detectability scores. Importantly, while the data obtained in the current study does provide more support for the alternate hypothesis than for the null hypothesis, the level of support is not sufficient to warrant firm conclusions in either direction. It is important to point out that the current study suffers from several limitations that may have attenuated the estimated strength of the relationships between the outcomes from the executive control tasks and the sender detectability scores. First, the participant-sender sample size may not have afforded much precision in terms of estimating effect sizes. To investigate this possibility a post hoc power analysis was conducted to estimate the statistical power of a three predictor variable equation with a sample size of 52 at three different levels of effect; small (r = .2), intermediate (r = .35), and large (r = .5). The alpha level used in the analysis was p < .05. The post hoc power analyses revealed that the statistical power for the current study was .30 for detecting a small effect, .75 for an intermediate effect, and .98 for a large effect. This means that if the relationships between the outcomes from the executive control tasks and the sender detectability scores were small to modest in size, the current study was not particularly well placed to detect such effects (Cohen, 1992). Considering the less than ideal statistical power of the current study, the finding of several small relationships is encouraging given the measurement error associated with both ability indicators. Specifically, while the messages produced in the current study did demonstrate some degree of intra-individual consistency, from a psychometric perspective the reliability between equivalent messages (messages of the same type) was lower than recommended levels (Nunnally & Bernstein, 1994). This means that the detectability scores used in the multiple regression analyses may have been poor representations of each participant-sender’s true ability to deceive, resulting in an increased amount of the total variance in detectability scores being attributed to

105

Chapter 5: Investigating the executive demands of deception ______measurement error, and thus attenuated effect size estimates. One possible explanation for the low reliability between equivalent messages is that the messages were too short (average length of 23.41 seconds, SD = 10.31) and did not allow a sufficient amount of the participant-senders demeanour to shine through. Furthermore, while the task of producing false opinions was rated by the participant-senders as moderately more cognitively demanding than a typical conversation, it may be the case that, in an absolute sense, the task was not sufficiently demanding to reliably induce signs of cognitive strain in the majority of the participant-senders. Another factor that may have contributed to the low reliability of the sender detectability scores pertains to the opinion items that were used in the current study. The results from the participant-receiver online opinion survey indicated that for 5 of the 20 opinion items used in part one (the false opinion task), participant-receivers thought that the majority of Australian University students would hold a position in agreement or disagreement (depending on the item). As such, the participant-receivers may have based their judgements (truth vs. lie) on the perceived probability of a certain response (agreement or disagreement) rather than on the verbal and nonverbal behaviours of the senders. This means that the hit and false alarm rates associated with these 5 items may have systematically varied from those associated with the other items, thus adding to the total variance in sender detectability scores. In addition to the measurement error associated with the detectability scores, the outcomes from the executive tasks may have also been poor representations of each participant-sender’s true executive abilities. Miyake and Friedman (2012) note that measuring individual differences in executive control abilities is a difficult task, primarily due to the task-impurity problem. They argued that any score derived from an executive control task necessarily includes systematic variance attributable to other cognitive mechanisms associated with that specific task context. This means that any observed differences in task outcomes may not necessarily reflect differences in executive control abilities per say, rather they may be due to differences in non- executive processes. As only a single indicator was used to represent each executive ability in the current study, the amount of variance attributable to non-executive process cannot be estimated and partialled out. Given the somewhat limited statistical power of the current study and the measurement error associated with both indicators of ability, it is encouraging that

106

Chapter 5: Investigating the executive demands of deception ______several small relationships were observed. The level of evidence, however, remains insufficient to make any firm conclusions. To more thoroughly investigate whether the executive demands of deception are causally relevant to the diagnosticity of behavioural displays, future research would need to remedy/minimize these problems. The next chapter presents a subsequent empirical study where these issues have been incorporated into the design of the study and the data analysis methods.

107

Chapter 6: Re-investigating the executive demands of deception ______

Chapter 6 Study 2 - Controlling Measurement Error: Re-investigating the Executive Demands of Deception The previous study sought to investigate whether the executive demands of deception are causally relevant to the diagnosticity of behavioural displays. While the results were encouraging, with several of the predicted relationships observed, the experimental procedures used in the previous study suffered from several limitations that restricted the conclusions that could be drawn from the results. Importantly, while the previous study assessed the amount of performance variability in the false opinion task that was explained by the simple executive control tasks, the estimates obtained from the analyses assumed that the measures of sender detectability and executive control abilities were largely free from measurement error. When this assumption was examined, however, it appeared to be untenable. As the results from the previous study were inconclusive, the current study aims to more thoroughly investigate whether the executive demands of deception are causally relevant to the diagnosticity of behavioural displays by using more robust experimental procedures and data analysis methods. Specifically, to reduce the measurement error associated with the estimates of executive abilities, participant-senders in the current study will complete three sets of executive control tasks (nine tasks total). The first set will contain three tasks designed to measure working memory. The second will contain three tasks designed to measure inhibitory control while the third set will contain three tasks designed to measure set shifting. As the three tasks within each set are designed to measure the same executive ability, the co-variances between the tasks can be used to estimate each participant-senders score on a latent variable. This technique reduces the task-impurity problem by partialling out the task-specific variance that is unique to any individual task, thus the latent variables will reflect purer estimates of each participant- sender’s true executive abilities, thereby reducing measurement error. The current study also aims to reduce the measurement error associated with the estimates of sender detectability. Specifically, the false opinion task in the current study will only include opinion items where a response in agreement is perceived by participant-receivers to be approximately as likely as a response in disagreement. That is, the revised opinion survey will only include the items from the participant-receiver online opinion survey used in the previous study that had a high probability of having a 108

Chapter 6: Re-investigating the executive demands of deception ______population mean between 2.5 and 3.5. This will reduce the amount of variance that is associated with the opinion items themselves, thereby reducing the measurement error associated with the sender detectability scores. To further reduce the measurement error associated with the estimates of sender detectability, the interviews in the current study will include several probe questions designed to increase task involvement and the cognitive complexity of the false opinion task. These probe questions were designed to make each participant-sender elaborate on their initial response, thus providing participant-receivers with a more detailed representation of each participant-senders expressive behaviours across the different conditions. Previous research has shown that the addition of probe questions tends to increase the rate of truthful judgements but not the degree to which truthful and deceptive messages are discriminated between (Bond, Malloy, Thompson, Arias, & Nunn, 2004; Buller, Comstock, Aune, & Strzyzewski, 1989; Buller, Strzyzewski, & Comstock, 1991; Levine, Park, & McCornack, 1999; Stiff & Miller, 1986). While it has been argued that this effect occurs independently of any changes that might occur in sender behavioural displays (see Levine & McCornack, 2001), if the experimental procedures used in the previous study did not promote a sufficient sample of the participant-senders behavioural displays for the participant-receiver judgements to be reliable, then the addition of probe questions may simply serve to increase the amount of information available in the recordings. In other words, we do not expect the probe questions to necessarily affect the true detectability of the participant-senders, but rather decrease the amount of measurement error associated with the veracity judgements. In addition to more precise estimates of sender detectability and executive abilities, the current study also includes three supplementary self-report measures. These measures were included to investigate whether the self-report measures in the previous study were subject to ceiling effects. To investigate this possibility the supplementary self-report measures will be presented at the end of the false opinion task such that the participant-senders simultaneously rate their experiences of emotional activation, cognitive load, and behavioural control across the three experimental conditions on an absolute scale (ranging from 0 (no emotional activation/cognitive load/behavioural control) to 100 (maximal emotional activation/cognitive load/behavioural control)).This format may be more sensitive to the differences between conditions.

109

Chapter 6: Re-investigating the executive demands of deception ______

It is anticipated that the changes to the experimental procedures described above will reduce the measurement error associated with the estimates of sender detectability and executive abilities, thus providing a more thorough investigation regarding the executive demands of deception. Furthermore, the current study will also use a larger sample of participant-senders to reduce the uncertainty associated with the effect size estimates. As the current study aims to address the limitations of the previous study, the research questions and hypotheses remain the same.

General Overview Similar to the previous study, the current three-part study uses two distinct samples of participants; a sample of participant-senders and a sample of participant- receivers. In part one, each participant-sender was video and audio recorded while producing three (rather than six) messages regarding controversial socio/political issues. Of these three messages, one was produced under conditions where the participant- senders were unaware that their messages were going to be evaluated for veracity and where they were given no instructions on how they should behave during the interviews (naïve truth condition), one was produced under conditions where the participant- senders were informed that their messages were going to be evaluated for veracity and where they were instructed to try to appear as credible as possible (informed truth condition), and one was produced under conditions where the participant-senders were instructed to provide false opinions while trying to appear as credible as possible (deceptive condition). Like the previous study, participant-senders completed three self- report measures after each message. These measures were designed to assess how much emotional activation, cognitive load, and behavioural control each participant-sender experienced during the preceding interview. Unlike the previous study, participant- senders also completed three supplementary self-report measures at the end of the false opinion task (once they had produced all three messages). These additional self-report measures were designed to further assess how levels of emotional activation, cognitive load, and behavioural control varied across the three experimental conditions. In part two, rather than completing three executive control tasks, participant- senders completed three sets of executive control tasks (nine tasks total), with each set designed to assess a certain aspect of their executive abilities (working memory updating, inhibitory control, or set shifting). Like the previous study, part three sought

110

Chapter 6: Re-investigating the executive demands of deception ______to measure participant-sender performance in the deception task by having a sample of participant-receivers evaluate the messages produced in the first part of the study for veracity (truth vs. lie). These judgements were again used to create two sets of sender detectability scores (naïve detectability scores and informed detectability scores).

Method Participant-Senders Ninety-five undergraduate psychology students from the University of New South Wales served as participant-senders and received partial course credit for their participation. Of the 95 participant-senders, 65 (68.42%) identified themselves as female and 30 (31.58%) as male. Their mean age was 21.69 (range 18 - 66, SD = 2.98). Participant-senders completed part two approximately one week after they had completed part one. Both parts took approximately 3 hours (total) to complete.

Participant-Receivers One thousand, one hundred and forty workers from Mechanical Turk served as participant-receivers and received US 50¢ for their participation. All of the participant- receivers indicated that they resided in the United States. Of the 1140 participant- receivers, 612 (53.68%) identified themselves as female and 528 (46.22%) as male. The mean age was 29.10 (range 18 - 67, SD = 10.19). Of the sample, 812 (71.23%) indicated that they most strongly identify as Caucasian, 137 (12.02%) as African American, 64 (5.61%) as Hispanic and 127 (11.14%) as other ethnic or cultural backgrounds. The majority of the sample indicated that they were native English speakers (n = 1071, 93.95%), with the 69 (6.05%) non-native English speakers indicating that they had been speaking English for an average of 16.11 years (range 8 - 45, SD = 6.32). Participant- receivers completed part three of the study, which took approximately 15 minutes to complete.

Materials and Procedure: Part One Revised false opinion task. Like the previous study, participant-senders were misleadingly recruited to a study investigating the cognition of public opinion. Upon arrival, participant-senders were provided with a misleading consent form that told them that the purpose of the study was to investigate how certain aspects of cognition

111

Chapter 6: Re-investigating the executive demands of deception ______contribute to the formation of people’s opinions. They were told that they would be required to complete a short paper and pencil opinion survey followed by several short interviews. They were also told that the interviews would be video and audio recorded and that during the interviews they would be required to provide a verbal justification regarding some of their responses on the opinion survey. Importantly, all the participant-senders were told that the recordings were for transcription purposes only and that nobody except the researchers involved in the study would be working with the materials. They were told that the purpose of the transcriptions was to convert their verbal accounts to a quantifiable data format. This deception was necessary to justify the recordings while not cueing participant-senders to the fact that the veracity of their messages would be evaluated during a later stage of the experiment. After consenting to the misleading procedure, participant-senders were provided with a revised paper and pencil opinion survey. While the opinion items in the previous study were not screened for whether the participant-receivers would perceive a participant-sender response in agreement to be approximately as likely as a participant- sender response in disagreement, the opinion items in the current study were screened against this criteria. This screening process used the data collected in part three of the previous study (the data from the online opinion survey that was completed by the participant-receivers in the previous study) and involved estimating the location of the average receiver response (a population level estimate) for each of the 30 opinion items. The uncertainty associated with these estimates was quantified by bootstrapping 95% confidence intervals for each of the average receiver responses. If the average receiver response had a high probability of being between 2.5 and 3.5 (if the lower bound of the 95% confidence interval was greater than 2.5 and the upper bound was smaller than 3.5), the opinion item was included in the revised opinion survey used in the current study; otherwise it was excluded. As this screening process involved estimating 30 average receiver responses, it is important to point out that the bounds of the confidence intervals were not adjusted to accommodate for the multiplicity of tests. This was because the false rejection of a truly balanced opinion item (an opinion item where a participant-sender response in agreement is perceived to be approximately as likely as a participant-sender response in disagreement) would have had little consequence - provided it was replaced with a balanced item. In other words, statistical power was

112

Chapter 6: Re-investigating the executive demands of deception ______deemed to be more important than controlling the Type 1 error rate. The complete revised opinion survey is presented in Appendix D. Once participant-senders had completed the revised opinion survey, the researcher entered each of participant-sender’s strongly endorsed opinion items (items where the ‘strongly agree’ or ‘strongly disagree’ response was selected) into an online number randomizer. Four strongly endorsed opinion items were then randomly selected to be the focus of the subsequent interviews. If a participant-sender did not strongly endorse four opinion items, the endorsed opinion items (items where the ‘agree’ or ‘disagree’ response was selected) were entered into the online number randomizer instead. The endorsed opinion items were then selected at random until there was a total of four opinion items (the strongly endorsed items plus the randomly selected endorsed items). The strongly endorsed and randomly selected endorsed items were then randomized again so that they would be randomly distributed over the naïve truth, informed truth, and deceptive conditions. After completing the revised opinion survey, participant-senders then moved onto the interview section of the revised false opinion task. While the participant- senders in the previous study produced six recorded messages (two per condition), the participant-senders in the current study only produced three recorded messages (one per condition). This was because pilot testing indicated that the messages produced by the revised false opinion task were considerably longer than those produced in the previous study. While this was the intended purpose of the revisions, the increased length of each message had implications for the total length of the veracity judgement task that would be completed by the participant-receivers in part three. As the current study also sought to increase the participant-sender sample size, having participant-senders produce more than one message per condition would have resulted in an unfeasibly long veracity judgement task or an unfeasibly large participant-receiver sample size, thus participant- sender only produced one message per condition. Similar to the interviews conducted in the previous study, each interview in the current study began with the researcher asking each participant-sender “do you believe that [Australia should build more nuclear power plants]”. The text in brackets changed depending on which opinion item was the focus of the interview. Unlike the interviews conducted in the previous study, however, the researcher followed up each participant- sender’s initial response with six probe questions. The researcher asked each participant

113

Chapter 6: Re-investigating the executive demands of deception ______sender: (1) “how strongly do you hold that opinion”, (2) “is this really your true opinion”, (3) “can you tell me any more reasons why you believe what you do”, (4) “you have told me your opinion, but others might hold the opposite view. What reasons do you think might lead them to hold an opposite opinion to yours”, (5) “can you tell me anymore reasons why others might hold an opposite opinion to yours”, and (6) “how many people do you think hold the opposite opinion to your own”. These questions were designed to have each participant-sender elaborate on their initial response, thus providing participant-receivers with a more detailed representation of their expressive behaviours across the different conditions. Furthermore, questions that require senders to argue against their initial position (questions 4 through 6; known as devil’s advocate questions) have been shown to illicit less information and longer latency times than opinion eliciting questions (questions 1 through 3) when the message is truthful but more information when the message is deceptive (Leal, Vrij, Mann, & Fisher, 2010). Receivers also appear to be sensitive to some of these differences, with truthful responses to devil’s advocate questions rated as less immediate, plausible, and emotionally involved than truthful responses to opinion eliciting questions. The inclusion of devil’s advocate questions in the current study may therefore increase the detectability of participant-senders. After completing a practice interview, participant-senders completed a recorded interview where they were given no further instructions on how to answer. This first interview constitutes the naïve truth condition. Similar to the previous study, once participant-senders had completed the naïve truth condition they were debriefed and the true aims of the study revealed. They were told that the true purpose of the recordings was to assess their ability to appear credible and that each recorded interview would be edited and shown to an independent sample of participant-receivers. They were told that the participant-receivers would be asked to discern the veracity of each message by analysing the verbal and nonverbal behaviours of the interviewees. Participant-senders were also told that out of the next two recorded interviews they would be required to provide one true opinion and one false opinion (an opinion contrary to their original response on the opinion survey). Finally, participant-senders were told that they should attempt to appear as credible as possible during the remaining interviews, regardless of whether they were providing their true opinion or a false opinion, and that a $100 gift voucher to the University of New South Wales bookstore would be awarded to the

114

Chapter 6: Re-investigating the executive demands of deception ______participant-sender who was rated as most credible across the remaining interviews. Participant-senders were then provided with an alternate consent form and asked if they were willing to continue with the study. They were told that if they did not wish to continue they would be excused from the rest of the study and awarded full credit. All participant-senders continued with the study. Once they re-consented to the study, participant-senders were provided with two cue cards (one lie and one truth card). They were instructed to shuffle the cards and place them face down on the table behind them. At the outset of each of the remaining interviews, participant-senders were asked to turn over the top cue card and respond to the following question in the manner named on the card. This was necessary to control for experimenter expectancies and to randomize the order of the remaining interviews. Participant-senders were asked to place the card in a separate pile on the table behind them so as to keep track of the veracity of each message. The remaining interviews proceeded as previously described. The revised false opinion task took approximately one hour to complete. Self-report measures. Like the previous study, participants-senders completed three computer-administered self-report measures immediately after each interview. These self-report measures were designed to quantify their experiences of emotional activation, cognitive load, and behavioural control during the preceding interview (three repeated-measures on each of the three different self-report scales). These self-report measures were the same as those used in the previous study. Specifically, the self-report measure of emotional activation asked “relative to a typical conversation, how emotionally activated (nervous, anxious, etc.) were you during the preceding interview”. The self-report measure of behavioural control asked “relative to a typical conversation, to what extent did you try to purposefully control your expressive behaviours (where you looked, the placement of your arms, what you did with your hands, the expression on your face, etc.) during the preceding interview”. The self-report measure of cognitive load asked “relative to a typical conversation, how cognitively demanding (had to think hard, had difficulty answering the question, etc.) did you find the preceding interview”. Responses to each question were made on a 7-point sliding scale (see Appendix B). The anchors of the scales were: (1) substantially below average, (2) moderately below average, (3) slightly below average, (4) no more than average, (5) slightly above average, (6) moderately above average, and (7) substantially above average. The sliding

115

Chapter 6: Re-investigating the executive demands of deception ______scales were centered at the middle and required participant-senders to move the sliders before a response would be recorded. Once participant-senders moved a slider, they were continuously presented with the sliders position to two decimal places. Unlike the previous study, participant-senders in the current study also completed three supplementary self-report measures at the end of the revised false opinion task (once they had recorded all three messages). These supplementary self- report measures were included to investigate whether the self-report measures used in the previous study were subject to ceiling effects. Each supplementary self-report measure presented participant-senders with three sliding scales; one for rating their experiences in naïve truth condition, one for rating their experiences in the informed truth condition, and another for rating their experiences in the deceptive condition. The three sliding scales were always presented simultaneously on the one screen. This feature sought to encourage relative comparisons across the three conditions. The location of each slider was numerically presented to participant-senders and ranged from 0 (the left most point) to 100 (the right most point). The left most point was labelled “no amount of [emotional activation]” whereas the right most point was labelled “maximal amount of [emotional activation]”. The text in brackets changed depending on whether the question pertained to emotional activation, cognitive load, or behavioural control. Each slider was initially centered at the middle point (50) and had to be moved before a response would be recorded. The supplementary self-report measures are presented in Appendix E.

Materials and Procedure: Part Two Revised battery of executive control tasks. To obtain a more accurate estimate of each participant-sender’s true executive abilities, each participant-sender returned approximately one week after completing part one of the current study to complete three sets executive control tasks (a total of nine tasks). One of the sets contained three tasks considered to measure working memory updating; a reverse digit span task (Conway, Kane, Bunting, Hambrick, Wilhelm, & Engle, 2005; Schmeichel, 2007), a running memory span task (Bunting, Cowan, & Saults, 2006; Broadway & Engle, 2010), and a n-back task (Jaeggi et al., 2003; Jaeggi et al., 2010; Owen et al., 2005). Another set contained three tasks considered to measure inhibitory control; a cue dependant go/no- go task (Fillmore, 2003), a stroop task (Macleod, 1991; Stroop, 1935), and a stop signal

116

Chapter 6: Re-investigating the executive demands of deception ______task (Carter, Farrow, Silberstein, Stough, Tucker, & Pipingas, 2003; Logan, 1994). The third set contained three tasks considered to measure set shifting; a WCST (Berg, 1954; Miyake et al., 2000), a trail making test (Arbuthnott & Frank, 2000; Sanchez-Cubillo et al., 2009), and a plus/minus task (miyake et al., 2000). While the WCST and n-back tasks were the same as those described in the previous chapter, the total number of trials on the cue dependant go/no-go task was increased to 400 (60% more trials). This was done to give participants a greater opportunity to make errors with the aim of raising the ceiling of the test. The order of task administration was held constant so that the relative differences across participant-senders executive performances would not include practice/order/depletion effects. The revised executive control task battery took approximately two hours to complete. The details of the additional executive control tasks are described below. Plus/minus task. Participants were told that they would be presented with three columns of numbers and that their task was to either add or subtract the number 7 from each number in the columns. They were told to add 7 to the numbers in the first column, subtract 7 from the numbers in the second column, and alternate between adding and subtracting the number 7 from the numbers in the third column. They were told to work as quickly as possible without making any errors and that their completion time would be recorded for each column. Each column contained 30 numbers with each number from 10 to 99 used once only. The numbers were randomly assigned to one of the three columns; however presentation order was the same for all participants. The dependant measure was calculated by subtracting the mean time of the first two columns from the time on the third column. Trail making test. Participants were presented with a part A sample sheet of the trail making test, face down, and told that their task was to connect a series of numbered circles in ascending order as fast as possible without making any errors. They were told that if they made an error they were to return to the last numbered circle where the error originated and continue and that their performance would be measured with a stopwatch. After the practice trial they completed part A of the trail making test which contained 25 circles with numbers. Once they had completed part A they were presented with the part B sample, face down, and told that the next stage of the test contained both numbers and letters and that their task was to connect the circles by alternating between the numbers and letters, starting with number 1. After the practice they completed part

117

Chapter 6: Re-investigating the executive demands of deception ______

B of the trail making test which contained 12 circles with numbers and 13 circles with letters. The dependant measure was time taken on part B minus the time taken on part A. Running-memory span task. Participant-senders were told that they would be presented with a series of letters one after the other and that their task was to remember and later recall a certain number of the most recent letters in the same order as they were presented. They were also told that the number of letters that they would be required to recall (target length) would vary between blocks of trials while the number of letters that they were not required to recall (distractor length) would vary within blocks of trials. Participants were told at the start of each block how many letters they must recall for that block of trials. They then completed a practice block containing 4 trials (target length = 2, distractor length = 0 through 3) before completing the primary task. The primary task consisted of 6 blocks of trials with each block containing 6 trials (36 trials total). Each block had a different target length (3 through 8). The order of block presentation was randomized for each participant. Within each block, three trials had a distractor length of 0 (whole recall trials) while the remaining three trials had distractor lengths of 1, 2, and 3 (partial recall trials). Distractor length was randomized within blocks. Letters were presented sequentially in the centre of the screen in black against a grey background for 300ms each. The interstimulus interval was 200ms. After each series of letters had been presented, participants made their responses via a 3 x 4 grid displaying all the letters from the set of possible letters. The grid also contained a blank button. Participants were reminded that they must recall the letters in the order in which they were presented and that they should use the blank button for each letter they forgot. The dependant measure was the number of letters from partial recall trials that were correctly recalled in the correct serial position. Reverse digit span task. Participant-senders were serially presented with a sequence of white digits presented centrally on a black background. Each digit was presented for 1000 milliseconds with a 250 millisecond inter-stimulus interval. Once the appropriate number of digits (starting at level 3) had been presented participant-senders were presented with a text entry box in the middle of the screen where they were prompted to recall the presented digits in reverse order. If the response was correct (in digits and presentation order) an additional digit was added to the length of the next

118

Chapter 6: Re-investigating the executive demands of deception ______trial. If the response was incorrect the same number of digits was presented a second time. The task ended after level 14 or after the participant-sender made two consecutive incorrect responses. The dependant measure was the highest level in which the participant-sender had a correct response. Stroop task. Participants were told that they were about to view a series of stimuli made up of words and asterisks presented in different colours and that their task was to indicate the colour of each stimuli by pressing corresponding computer keys (d for red, f for green, j for blue, and k for black stimuli). They were told to ignore what the words actually say and to work as quickly as possible without making any errors. At the start of each trial a white screen was presented for 200ms, after which a colour stimuli was presented in the middle of the white screen. The stimuli were left on-screen until a response occurred. If a participant made an incorrect response a black cross was presented in the middle of the screen for 400ms. The task consisted of 144 randomly intermixed trials; 72 control trials with a string of asterisks printed in one of the four colours, 60 incongruent trials with a colour word printed in a different colour (e.g. red printed in blue), and 12 congruent trials with a colour word printed in the same colour (e.g. red printed in red). After completing 16 practice trials (8 incongruent and 8 control trials), participants completed the primary task. The dependant measure was the average reaction time for the incongruent trials minus the average reaction time for the control trials. Stop signal task. Participants were told that they were about to view a series of arrows pointing either to the left or to the right and that their task was to indicate the direction of each arrow by pressing corresponding computer keys (d for left and k for right arrows). They were told that some arrows would be followed with a sound indicating that they were to withhold their response on that trial. They were also told that while the sounds would occur at various delays, they should not wait for a stop signal to occur as some of the stop-signals depend upon participant responses. Participants were then given a pair of headphones and began a practice phase consisting of 32 trials (24 go trials and 8 stop trials). After the practice phase participants completed three blocks of 64 trials (48 go trials and 16 stop trials). Between blocks participants received information about their performance in the previous block. At the start of each trial a white circle was presented in the middle of the screen on a black background. After 250ms either a left or right facing arrow appeared in the

119

Chapter 6: Re-investigating the executive demands of deception ______middle of the circle. The arrows remained on the screen until the participant responded or until 1250ms had elapsed. The interstimulus interval was 750ms. On stop trials an audio tone was presented after a variable delay. The initial delay was 250ms. When participants successfully inhibited a response the delay was increased by 50ms. When participants failed to inhibit a response the delay was decreased by 50ms. The dependant measure was calculated by subtracting the mean stop signal delay from the mean reaction time for go trials.

Materials and Procedure: Part Three Online veracity judgement task. Participant-receivers completed part three of the current study. The online veracity judgement task was identical to that described in the previous chapter, except participant-receivers in the current study only evaluated five messages (rather than ten). This change was necessary as the messages produced in the current study were considerably longer than those produced in the previous study (average message length = 135.10 seconds, SD = 44.09). To maintain a high number of receiver judgements per message (20) while minimising fatigue effects, the current study had each participant-receiver evaluate fewer messages but recruited a larger number of participant-receivers. Participant-receivers were first presented with an online consent form and instructions explaining that they were about to view five recorded interviews of people stating various opinions. They were told that while some of these people were stating their true opinions, others were stating false opinions. They were asked to evaluate each interviewee’s verbal and nonverbal behaviours and to indicate whether they thought each interviewee was being truthful or deceptive. After reading the instructions and providing their consent, participant-receivers were presented with an audio validation screen. This screen consisted of a blank text entry field and an audio recording explaining that in order to proceed they must enter the validation word ‘CAT’ into the text entry field. This ensured that all participant- receivers had functioning audio systems and were able to hear the participant-sender messages. Once they entered the correct validation word, participant- receivers were pseudo-randomly presented with five participant-sender messages. The pseudo-random assignment ensured that no participant-sender would appear more than once per

120

Chapter 6: Re-investigating the executive demands of deception ______sequence, thus eliminating the possibility of participant-senders contradicting themselves across messages and the participant- receivers using this information to inform their veracity judgements. Each trial began with the presentation of a message, during which time all the participant- receivers controls were locked. Once each message had finished playing, the task automatically proceeded to the response screen where the participant- receivers could indicate whether they thought the preceding message was truthful or deceptive via a forced choice response (truth vs. lie). After each participant- receiver had completed all ten trials, a ‘catch trial’ was presented. This consisted of a video-audio recording of the researcher stating that “In order to ensure data integrity, please select the lie response on the next screen”. This trial served to identify participant-receivers who were not paying close attention or randomly responding (Paolacci et al., 2010). No participant-receivers failed the catch trial. The online veracity judgment task took approximately 15 minutes to complete.

Results Research Question 1: Replications Similar to the previous study, to assess whether the experimental procedures used in the current study replicated some of the findings reported in the deception literature (senders tend to be somewhat detectable), naïve and informed detectability scores were compared against hypothetical null distributions via two one-sample t-tests. As each comparison pertained to the same research question, the significance of each comparison was evaluated against an adjusted alpha level of .025. The tests indicated that the mean naive detectability score (M = 0.51, SD = 0.41, 95% CI [0.43, 0.60]) was significantly greater than that expected by chance, t(94) = 12.15, p < .005, d = 1.24, 95% CI [0.98, 1.51]. This result suggests that senders tend to be perceived as more credible when they are telling the truth and taking their credibility for granted than when they are lying. The tests also indicated that the mean informed detectability score (M = 0.21, SD = 0.46, 95% CI [0.12, 0.31]) was marginally greater than that expected by chance, t(94) = 4.38, p < .005, d = 0.45, 95% CI [0.24, 0.66]. This result suggests that senders tend to be perceived as more credible when they are telling the truth and not taking their credibility for granted than when they are lying. In addition to assessing whether the experimental procedures used in the current study produced somewhat detectable senders, the research question also pertained to the

121

Chapter 6: Re-investigating the executive demands of deception ______sender credibility scores. To assess whether the experimental procedures used in the current study produced a veracity effect, naïve and informed credibility scores were also compared against hypothetical null distributions via two one-sample t-tests. The tests indicated that the mean naïve credibility score (M = 0.25, SD = 0.47, 95% CI [0.15, 0.34]) was significantly greater than that expected if there was no veracity effect, t(94) = 5.19, p < .005, d = 0.53, 95% CI [0.31, 0.75]. The tests also indicated that the mean informed credibility score (M = 0.10, SD = 0.44, 95% CI [0.01, 0.19]) was marginally greater than that expected if there was no veracity effect, t(51) = 2.10, p = .04, d = 0.22, 95% CI [0.01, 0.42]. Together, these results suggest that the experimental procedures used in the current study produced a veracity effect. That is, senders tended to be believed.

Research Question 2: Self-Report Measures To investigate whether the mean levels of self-reported emotional activation, cognitive load, and/or behavioural control varied across the three conditions, six repeated measures ANOVAs were conducted. While all of the models contained a within-subjects factor for condition (naïve truth vs. informed truth vs. deceptive), three of the models specified the original self-report measures as the dependant variables (those completed immediately after each interview on a scale ranging from 1 to 7) whereas the other three models specified the supplementary self-report measures as the dependant variables (those completed at the end of the false opinions task on a scale ranging from 0 to 100). Within each analysis the three planned contrasts (naïve truth vs. informed truth; naïve truth vs. deceptive; informed truth vs. deceptive) were evaluated against a Šidák adjusted alpha level of .017. Original emotional activation measure. The mean emotional activation scores for the original self-report measure and their respective 95% confidence intervals are presented in Figure 5. The contrast tests revealed that the mean difference between the self-report ratings of emotional activation for the naive truth and informed truth 2 conditions was not statistically significant, F(1, 94) = 2.38, MSE = 2.77, p = .13, η p = .02, 90% CI [.00, .10], with the mean self-report rating for the naïve truth condition only 0.26 units lower (95% CI [-.08, .60]) than the mean self-report rating for the informed truth condition. This means that on average the participants-senders reported experiencing approximately the same amount of emotional activation in the naive truth

122

Chapter 6: Re-investigating the executive demands of deception ______condition as in the informed truth condition. By contrast, the mean difference between the self-report ratings of emotional activation for the naïve truth and deceptive 2 conditions was statistically significant, F(1, 94) = 9.77, MSE = 1.47, p < .005, η p = .09, 90% CI [.02, .19], with the mean self-report rating for the naive truth condition 0.39 units lower (95% CI [0.14, 0.64]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing more emotional activation in the deceptive condition than in the naïve truth condition. The mean difference between the self-report ratings of emotional activation for the informed truth and deceptive conditions, however, was not statistically significant, F(1, 2 94) = 0.80, MSE = 1.90, p = .37, η p = .01, 90% CI [.00, .06], with the mean self-report rating for the informed truth condition only 0.17 units lower (95% CI [-0.15, 0.41]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing approximately the same amount of emotional activation in the deceptive condition as in the informed truth condition.

7

s

e

r o

c 6

S

n o

i 5

t

a

v i

t 4 N a iv e T ru th C o n d itio n

c A

In fo rm e d T ru th C o n d itio n l

a 3

n D e c e p tiv e C o n d itio n

o i

t 2

o m

E 1 E m o tio na l A c tiva tio n

F ig u r e 5 . M e a n e m o tio n a l a c tiv a tio n s c o re s (o rig in a l s e lf-re p o r t m e a s u re ) b y c o n d itio n . E rr o r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

Supplementary emotional activation measure. The mean emotional activation scores for the supplementary self-report measure and their respective 95% confidence intervals are presented in Figure 6. The contrast tests revealed that the mean difference between the self-report ratings of emotional activation for the naive truth and informed truth conditions was not statistically significant, F(1, 94) = 2.95, MSE = 719.27, p = .09, 2 η p = .03, 90% CI [.00, .11], with the mean self-repot rating for the naïve truth condition 4.73 units lower (95% CI [-0.74, 10.19]) than the mean self-report rating for the

123

Chapter 6: Re-investigating the executive demands of deception ______informed truth condition. This means that on average the participant-senders reported experiencing approximately the same amount of emotional activation in the naïve truth condition as in the informed truth condition. By contrast, the mean difference between the self-report ratings of emotional activation for the naïve truth and deceptive condition 2 was statistically significant, F(1, 94) = 16.65, MSE = 861.90, p < .005, η p = .15, 90% CI [.05, .26], with the mean self-report rating for the naive truth condition 11.13 units lower (95% CI [5.15, 17.11]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing approximately more emotional activation in the deceptive condition than in the naïve truth condition. The mean difference between the self-report ratings of emotional activation for the informed truth and deceptive condition, however, was not statistically 2 significant, F(1, 94) = 4.48, MSE = 869.81, p = .04, η p = .05, 90% CI [.00, .13], with the mean self-report rating for the informed truth condition 6.40 units lower (95% CI [0.40, 12.41]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing approximately the same amount of emotional activation in the deceptive condition as in the informed truth condition.

1 0 0

s

e

r

o c

S 8 0

n

o

i t

a 6 0

v i

t N a iv e T ru th C o n d itio n

c A

4 0 In fo rm e d T ru th C o n d itio n

l a

n D e c e p tiv e C o n d itio n

o i

t 2 0

o m

E 0 E m o tio na l A c tiva tio n F ig u r e 6 . M e a n e m o tio n a l a c tiv a tio n s c o re s (s u p p le m e n ta ry s e lf-re p o r t m e a s u re ) b y c o n d itio n . E rr o r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

Original cognitive load measure. The mean cognitive load scores for the original self-report measure and their respective 95% confidence intervals are presented in Figure 7. The contrast tests revealed that the mean difference between the self-report ratings of cognitive load for the naive truth and informed truth conditions was

124

Chapter 6: Re-investigating the executive demands of deception ______

2 statistically significant, F(1, 94) = 18.40, MSE = 3.94, p < .005, η p = .16, 90% CI [.06, .27], with the mean self-report rating for the naïve truth condition 0.87 units lower (95% CI [0.47, 1.28]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing more cognitive load in the informed truth condition than in the naive truth condition. Likewise, the mean difference between the self-report ratings of cognitive load for the naïve truth and deceptive conditions was also statistically significant, F(1, 94) = 35.07, MSE = 3.77, p < 2 .005, η p = .27, 90% CI [.15, .38], with the mean self-report rating for the naive truth condition 1.18 units lower (95% CI [0.78, 1.57]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing more cognitive load in the deceptive condition than in the naive truth condition. Similarly, the mean difference between the self-report ratings of cognitive load for the informed truth and deceptive conditions was marginally significant, F(1, 2 94) = 5.26, MSE = 1.68, p = .02, η p = .05, 90% CI [.00, .14], with the mean self-report rating for the informed truth condition 0.31 units lower (95% CI [0.04, 0.57]) than the mean self-report rating for the deceptive condition. This means that while on average the participant-senders reported experiencing more cognitive load in the deceptive condition than in the informed truth condition, the effect was small and the level of evidence insufficient to make firm conclusions in either direction (more cognitive load vs. no differences in cognitive load).

7 s

e 6

r

o c

S 5

d a

o 4 N a iv e T ru th C o n d itio n

L

e

v In fo rm e d T ru th C o n d itio n t

i 3

n D e c e p tiv e C o n d itio n g

o 2 C

1 C o gnitive L o a d

F ig u r e 7 . M e a n c o g n itiv e lo a d s c o re s (o rig in a l s e lf-re p o r t m e a s u r e ) b y c o n d itio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

125

Chapter 6: Re-investigating the executive demands of deception ______

Supplementary cognitive load measure. The mean cognitive load scores for the supplementary self-report measure and their respective 95% confidence intervals are presented in Figure 8. The contrast tests revealed that the mean difference between the self-report ratings of cognitive load for the naive truth and informed truth condition was 2 statistically significant, F(1, 94) = 19.05, MSE = 949.80, p < .005, η p = .17, 90% CI [.07, .28], with the mean self-repot rating for the naïve truth condition 13.80 units lower (95% CI [7.52, 20.08]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing more cognitive load in the informed truth condition than in the naive truth condition. Likewise, the mean difference between the self-report ratings of cognitive load for the naïve truth and deceptive condition was also statistically significant, F(1, 94) = 90.35, 2 MSE = 1360.52, p < .005, η p = .49, 90% CI [.37, .58], with the mean self-report rating for the naive truth condition 35.97 units lower (95% CI [28.46, 43.48]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing more cognitive load in the deceptive condition than in the naive truth condition. Similarly, the mean difference between the self-report ratings of cognitive load for the informed truth and deceptive conditions was 2 statistically significant, F(1, 94) = 28.24, MSE = 1653.28, p < .005, η p = .23, 90% CI [.12, .34], with the mean rating for the informed truth condition 22.17 units lower (95% CI [13.89, 30.45]) than the mean rating for the deceptive condition. This means that on average the participant-senders reported experiencing more cognitive load in the deceptive condition than in the informed truth condition.

1 0 0

s e

r 8 0

o

c

S

d

a 6 0 o

L N a iv e T ru th C o n d itio n

e

v 4 0 In fo rm e d T ru th C o n d itio n

i

t i

n D e c e p tiv e C o n d itio n g

o 2 0 C

0 C o gnitive L o a d F ig u r e 8 . M e a n c o g n itiv e lo a d s c o re s (s u p p le m e n ta ry s e lf-re p o r t m e a s u re ) b y c o n d itio n . E rr o r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

126

Chapter 6: Re-investigating the executive demands of deception ______

Original behavioural control measure. The mean behavioural control scores for the original self-report measure and their respective 95% confidence intervals are presented in Figure 9. The contrast tests revealed that the mean difference between the self-report ratings of behavioural control for the naive truth and informed truth 2 conditions was statistically significant, F(1, 94) = 49.86, MSE = 2.20, p < .005, η p = .35, 90% CI [.22, .45], with the mean self-report rating for the naïve truth condition 1.07 units lower (95% CI [0.77, 1.38]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing more behavioural control in the informed truth condition than in the naive truth condition. Likewise, the mean difference between the self-report ratings of behavioural control for the naïve truth and deceptive conditions was also statistically significant, 2 F(1, 94) = 56.10, MSE = 2.31, p < .005, η p = .37, 90% CI [.25, .48], with the mean self- report rating for the naive truth condition 1.17 units lower (95% CI [0.86, 1.48]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing more behavioural control in the deceptive condition than in the naive truth condition. In contrast, the mean difference between the self-report ratings of behavioural control for the self-report ratings of behavioural control for the informed truth and deceptive condition was not statistically significant, 2 F(1, 94) = 0.83, MSE = 1.02, p = .36, η p = .01, 90% CI [.00, .06], with the mean self- report rating for the informed truth condition 0.10 units lower (95% CI [-0.11, 0.30]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing approximately the same amount of behavioural control in the deceptive condition as in the informed truth condition.

7

s

e

r o

c 6

S

l o

r 5

t n o N a iv e T ru th C o n d itio n

C 4

l a

r In fo rm e d T ru th C o n d itio n

u 3 o

i D e c e p tiv e C o n d itio n v

a 2

h

e B 1 B e ha vio ura l C o ntro l

F ig u r e 9 . M e a n b e h a v io u ra l c o n tro l s c o re s (o rig in a l s e lf-re p o r t m e a s u re ) b y c o n d itio n . E rr o r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

127

Chapter 6: Re-investigating the executive demands of deception ______

Supplementary behavioural control measure. The mean behavioural control scores for the supplementary self-report measure and their respective 95% confidence intervals are presented in Figure 10. The contrast tests revealed that the mean difference between the self-report ratings of behavioural control for the naive truth and informed truth condition was statistically significant, F(1, 94) = 106.36, MSE = 787.13, p < .005, 2 η p = .53, 90% CI [.41, .61], with the mean self-report rating for the naïve truth condition 29.67 units lower (95% CI [23.97, 35.40]) than the mean self-report rating for the informed truth condition. This means that on average the participant-senders reported experiencing more behavioural control in the informed truth condition than in the naive truth condition. Likewise, the mean difference between the self-report ratings of behavioural control for the naïve truth condition and the deceptive condition was also 2 statistically significant, F(1, 94) = 142.61, MSE = 753.21, p < .005, η p = .60, 90% CI [.50, .67], with the mean self-report rating for the naive truth condition 33.63 units lower (95% CI [28.04, 39.22]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing more behavioural control in the deceptive condition than in the naive truth condition. In contrast, the mean difference between the self-report ratings of behavioural control for the informed truth and deceptive condition was not statistically significant, F(1, 94) = 2 1.73, MSE = 851.13, p = .19, η p = .02, 90% CI [.00, .08], with the mean self-report rating for the informed truth condition 3.94 units lower (95% CI [-2.00, 9.88]) than the mean self-report rating for the deceptive condition. This means that on average the participant-senders reported experiencing approximately the same amount of behavioural control in the deceptive condition as in the informed truth condition.

1 0 0

s

e

r o

c 8 0

S

l

o

r t

n 6 0

o N a iv e T ru th C o n d itio n

C

l

a 4 0

r In fo rm e d T ru th C o n d itio n

u o

i D e c e p tiv e C o n d itio n v

a 2 0

h

e B 0 B e ha vio ura l C o ntro l F ig u r e 1 0 . M e a n b e h a v io u ra l c o n tro l s c o re s (s u p p le m e n ta ry s e lf-re p o r t m e a s u re ) b y c o n d itio n . E rr o r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

128

Chapter 6: Re-investigating the executive demands of deception ______

Research Question 3: Executive Control Tasks To investigate whether the outcomes from the executive control tasks predicted any of the variance in the sender detectability scores and whether the magnitude of the predictive relationships changed depended on which type of truthful message was used to operationalize detectability (naïve truth vs. informed truth), two series of structural equation models (SEM) were fit to the data. Like the previous study, the first series of SEMs assessed the amount of variability in the naïve detectability scores that was predicted by the executive control tasks whereas the second series of SEMs assessed the amount of variability in the informed detectability scores that was predicted by the executive control tasks. Before interpreting the SEMs, however, preliminary analyses were conducted to assess the reliability of the outcomes from the false opinion task, the dispersion of the outcomes from the executive control tasks, and their suitability for a latent variable analysis. To assess the intra-individual consistency of the outcomes from the false opinion task, the inter-correlations between the by-sender aggregated truth scores (the number of truthful responses divided by the number of total responses) for the three message types (naïve truth vs. informed truth vs. deceptive) were examined. Correlational analyses indicted that the truth scores for the naive truth condition were significantly and positively correlated with those for the informed truth condition, r(93) = .69, p < .005. Likewise, the truth scores for the naive truth condition were also significantly and positively correlated with those for the deceptive condition, r(93) = .53, p < .005. Finally, the truth scores for the informed truth condition were significantly and positively correlated with those for the deceptive condition, r(93) = .50, p < .005. This pattern of inter-correlations indicates a moderate degree of intra-sender consistency and is an improvement upon the results observed in the previous study. Before fitting the full series of SEMs, it was important to assess whether each set of executive tasks actually tapped a common underlying executive ability and whether the executive task scores were suitable for a latent variable analysis. Like the previous study, preliminary data analyses were conducted to assess the distributions of the executive task scores, the results of which are presented in Table 6.

129

Chapter 6: Re-investigating the executive demands of deception ______

Table 8 Descriptive Statistics for the Battery of Executive Tasks used to Estimate Latent Executive Abilities Task Mean SD Range Skewness Kurtosis Stroop 103.38ms 57.12 14.77 – 269.53 0.68 -0.01 SS 266.10ms 74.79 102.13 – 522.15 0.51 1.81 TMT 20.96s 9.40 1.71 – 39.67 0.27 -0.78 PM 10.49s 16.47 -25.75 – 50.17 0.03 0.13 GNG 6.68 3.76 0 – 14 0.15 -1.12 WCST .16 .07 .02 - .36 0.19 -0.54 RST 38.06 11.64 7 – 67 -0.08 1.17 RDS 5.97 2.12 1 – 11 -0.10 -0.44 N-back 2.03 0.70 1.22 – 3.57 0.72 -0.86

An examination of the descriptive statistics and task distributions indicated that the executive task scores were suitable for further analyses. To assess whether each set of executive tasks actually tapped a common underlying executive ability, the Pearson correlations between the executive task scores were examined. The correlation matrix is presented in Table 7.

Table 9 Pearson Correlation Coefficients Between the Executive Task Scores and Detectability Scores Task 1 2 3 4 5 6 7 8 9 1. TMT ------2. WCST .31** ------3. PM .22* .26** ------4. RDS -.11 -.05 .03 ------5. N-back .28** .10 .31** .26** ------6. RST .09 .21* .12 .35** .44** ------7. Stroop .12 .14 .11 .12 .13 .18† ------8. SS .18† .11 -.01 .11 .17† .24* .38** -- -- 9. GNG .16 .03 .02 .14 .14 .19* .28** .25* -- 10. IDS .01 .08 .07 .07 .21* .33** .17† .12 .14 11. NDS .23* .03 .04 .07 .22* .34** .17† .20* .15 Note. † p < 0.1 level, two-tailed. * p < .05, two-tailed. ** p < 0.01 level, two-tailed.

The correlation matrix indicated that while the bivariate correlations between the nine executive task scores were generally low, executive tasks that were considered to rely on the same underlying executive ability were all significantly and positively correlated to one another. Specifically, the correlations between the three executive tasks considered to measure set shifting (TMT, WCST, and PM) ranged from r = .22 to r = .31. Similarly, the correlations between the three executive tasks considered to

130

Chapter 6: Re-investigating the executive demands of deception ______measure inhibitory control (Stroop, SS, and GNG) ranged from r = .25 to r = .38. Finally, the correlations between the three executive tasks considered to measure working memory updating (RDS, N-back, and RST) ranged from r = .26 to r = .44. This pattern of inter-relatedness is consistent with previous research on executive control processes (Miyake et al., 2000) and suggests that the executive tasks considered to tap the same executive ability are sufficiently related to one another to proceed with a latent variable analysis. As a finally step before fitting the full series of SEMs, a confirmatory factor analysis (CFA) was fit to the executive task scores. The CFA was specified according to the full three factor model proposed by Miyake et al. (2000). Importantly, only a single CFA was fit to the data as the primary aim of the research was to assess the relationship between the latent executive abilities and the sender detectability scores, rather than investigate the nature of executive abilities themselves. The full three-factor model and the estimated factor loadings are presented in Figure 11.

Figure 11. The full three-factor model used in the confirmatory factor analysis. The ellipses represent the executive abilities (latent variables) whereas the rectangles represent the executive tasks (observed variables) that were considered to tap the specific executive abilities, as indicated by the single-headed arrows. The single-headed arrows have standardized factor loadings next to them whereas the double-headed arrows have correlation coefficients next to them.

131

Chapter 6: Re-investigating the executive demands of deception ______

The fit indices for the CFA were all within acceptable limits. Specifically, the model produced a non-significant chi-squared test, χ2(24) = 29.55, p = .20, indicating that the model’s predictions did not significantly deviate from the actual data pattern. Furthermore, whereas the values of the AIC and SRMR were quite low (71.55 and .066, respectively), the IFI and CFI values were suitably high (.943 and .934, respectively). These results suggest that the CFA seems to fit the overall data quite well. As the results of the CFA suggest that the executive task scores are sufficiently related to one another to afford the estimation of latent executive abilities, the full series of SEMs were fit to the data. These models assessed the extent to which the latent executive abilities predicted each set of sender detectability scores (naïve detectability scores vs. informed detectability scores). The logic of the analysis centres on a model trimming procedure where alternative models are compared against one another via a series of likelihood ratio tests. First, a higher order model with paths from each of the three latent executive abilities to the observed sender detectability scores was fit to the data (a full three-path model). This model was then compared to lower order nested models where one or more of the three path coefficients were constrained to equal zero (three two-path models and three one-path models). When the fit of the higher order model was statistically better than that of the lower order model, the constrained path coefficient was retained as it uniquely explained some of the variability in the sender detectability scores. When the fit of the higher order model was not statistically better than that of the lower order model, the constrained path coefficient was abandoned as it did not contribute to the prediction of sender detectability scores. For each set of sender detectability scores, this procedure resulted in eight hierarchically structured models being compared against one another. The fit indices and standardized regression coefficients for the eight models predicting the naïve detectability scores are presented in Table 8.

132

Chapter 6: Re-investigating the executive demands of deception ______

Table 10 Fit Indices for the Structural Equation Models Predicting Naïve Detectability Scores Coefficients for specified paths 2 Model df χ SRMR IFI Updating Inhibition Shifting 1. Full three paths 30 34.97 .06 .96 .30† .15 .06 2. Two paths from 31 35.08 .06 .96 .32* .17 -- updating and inhibition 3. Two paths from 31 35.77 .07 .96 .36* -- .13 updating and shifting 4. Two paths from 31 38.78 .07 .93 -- .32† .14 inhibition and shifting 5. One path from 32 36.25 .07 .96 .42** -- -- Updating 6. One path from 32 39.31 .07 .93 -- .40* -- Inhibition 7. One path from Shifting 32 42.50 .08 .90 -- -- .37† Note. † p < .10, two-tailed. * p < .05, two-tailed. ** p < .01, two-tailed.

The results from the likelihood ratio tests indicated that the full-three path model (Model 1) provided a significantly better fit than the two-path model with paths from 2 inhibition and shifting (Model 4), χ D (1) = 3.81, p = .05, and the one-path model with a 2 single path from shifting (Model 7), χ D (1) = 7.53, p = .02. It did not, however, provide a significantly better fit than the two-path model with paths from updating and 2 inhibition (Model 2), χ D (1) = 0.18, p = .67, the two-path model with paths from 2 updating and shifting (Model 3), χ D (1) = 0.66, p = .42, the one-path model with a 2 single path from updating (Model 5), χ D (2) = 0.72, p = .70, or the one-path model with 2 a single path from inhibition (Model 6), χ D (2) = 5.73, p = .06. Further tests indicated that the one-path model with a single path from updating (Model 5) provided as good a 2 fit as the two-path model with paths from updating and inhibition (Model 2), χ D (1) = 1.17, p = .28, as well as the two-path model with paths from updating and shifting 2 (Model 3), χ D (1) = 0.48, p = .49. Given that the three one-path models have the same degrees of freedom, likelihood ratio tests are unable to assess whether any particular one-path model provides a significantly better fit than any of the other one-path models. Since the three one-path models have an equivalent number of free parameters, however, the model chi squared values can be directly compared, with smaller values indicating better fit. Of the three one-path models, the model with a single path from updating had the best fit (Model 5).

133

Chapter 6: Re-investigating the executive demands of deception ______

The above procedure was repeated with the eight models predicting the informed detectability scores. The fit indices and standardized regression coefficients for these models are presented in Table 9.

Table 11 Fit Indices for the Structural Equation Models Predicting Informed Detectability Scores Coefficients for specified paths 2 Model df χ SRMR IFI Updating Inhibition Shifting 1. Full three paths 30 32.81 .06 .97 .36* .11 -.06 2. Two paths from 31 32.94 .06 .98 .35* .09 -- updating and inhibition 3. Two paths from 31 33.25 .06 .98 .41** -- -.03 updating and shifting 4. Two paths from 31 38.25 .07 .93 -- .32† .02 inhibition and shifting 5. One path from 32 33.28 .06 .99 .40** -- -- Updating 6. One path from 32 38.26 .07 .94 -- .33* -- Inhibition 7. One path from Shifting 32 42.18 .08 .90 -- -- .22 Note. † p < .10, two-tailed. * p < .05, two-tailed. ** p < .01, two-tailed.

The results from the likelihood ratio tests indicated that the full-three path model (Model 1) provided a significantly better fit than the two-path model with paths from 2 inhibition and shifting (Model 4), χ D (1) = 5.44, p = .02, and the one-path model with a 2 single path from shifting (Model 7), χ D (1) = 9.37, p = .01. It did not, however, provide a significantly better fit than the two-path model with paths from updating and 2 inhibition (Model 2), χ D (1) = 0.13, p = .72, the two-path model with paths from 2 updating and shifting (Model 3), χ D (1) = 0.44, p = .51, the one-path model with a 2 single path from updating (Model 5), χ D (2) = 0.47, p = .79, or the one-path model with 2 a single path from inhibition (Model 6), χ D (2) = 5.45, p = .07. Further tests indicated that the one-path model with a single path from updating (Model 5) provided as good a 2 fit as the two-path model with paths from updating and inhibition (Model 2), χ D (1) = 0.34, p = .56, as well as the two-path model with paths from updating and shifting 2 (Model 3), χ D (1) = 0.03, p = .86. Of the remaining three one-path models, the model with a single path from updating had the best fit (Model 5). This means that people with poor working memory updating skills tended to be poor liars.

134

Chapter 6: Re-investigating the executive demands of deception ______

Discussion Research Question 1: Replications Similar to study 1, it was considered appropriate to assess whether the experimental procedures used in the current study produced data in line with previous lie detection studies. The results demonstrated that on average sender detectability was significantly greater than chance performance (participant-senders were somewhat detectable). The results also demonstrated that on average sender credibility was marginally greater than that expected if there was no veracity effect (participant-senders tended to be believed). Furthermore, a comparison of the 95% confidence intervals suggests that the effects observed in the current study were not significantly different to those observed in the previous study. Taken together, these results suggest that the experimental procedures used in both studies are convergently valid with those used in previous lie detection research.

Research Questions 2: Self-Report Measures The results from the original self-report behavioural control measure are again consistent with the idea that cueing people to their credibility results in near maximum levels of behavioural control. Indeed, the mean levels of self-reported behavioural control for the informed truth and deceptive conditions were both significantly higher than the mean level of self-reported behavioural control for the naïve truth condition. Importantly, the self-reported behavioural control scores for the informed truth and deceptive conditions were once again quite similar, with both groups of scores lying close to the top anchor of the self-report scale and exhibiting relatively little within condition variability. These results are largely consistent with those from the supplementary self-report measure of behavioural control. As the ratings for each condition were made at the same time on the supplementary self-report measure, and the scale anchors were set such that the ceiling of the supplementary measure reflected maximal levels of behavioural control, it is unlikely that the results reflect a truncated measurement. Rather, the results from both the original self-report and supplementary self-report measures support the hypothesis that cueing people to their credibility results in near maximum levels of behavioural control. Like the previous study, the results from the original self-report cognitive load measure indicated that cueing people to their credibility also increased the amount of

135

Chapter 6: Re-investigating the executive demands of deception ______cognitive load they experienced. Indeed, the mean levels of self-reported cognitive load for the informed truth and deceptive conditions were both higher than the mean level for the naïve truth condition. Contrary to the results from the previous study, however, the mean level of self-reported cognitive load for the deceptive condition was only marginally higher than the mean level for the informed truth condition. A possible explanation for these discrepant findings is that the addition of the probe questions increased the cognitive complexity of the false opinion task, as they were designed to do. This explanation is supported by a post hoc inspection of the 95% confidence intervals for the mean levels of self-reported cognitive load, with the mean levels for the naïve truth (M = 4.90, 95% CI [4.58, 521]) and informed truth (M = 5.77, 95% CI [5.52, 6.02]) conditions of the current study significantly higher than those observed in the previous study (M = 3.90, 95% CI [3.56, 4.25]; M = 4.58, 95% CI [4.21, 4.94], respectively). Interestingly, the mean levels of self-reported cognitive load for the deceptive condition were similar across the two studies. However, when the ratings were made on the supplementary self-report measure of cognitive load, not only was the mean level for the deceptive condition significantly higher than the mean level for the informed truth condition, but the magnitude of the difference was greater than the magnitude of the difference between the naïve truth and informed truth conditions. Specifically, the difference between the informed truth and deceptive conditions accounted for around 23% of the variances for the informed truth and deceptive conditions, whereas the difference between the naïve truth and informed truth conditions accounted for around 17% of the variances for the naïve truth and informed truth conditions. This suggests that while the addition of the probe questions caused perceptions of cognitive load to exceed the ceiling of the original self-report measure, the general finding that the unique aspects of deception cause a greater rise in perceptions of cognitive load than cueing people to their credibility is consistent across both studies. In the previous study, the deceptive condition induced higher levels of self- reported emotional activation than the naïve truth and informed truth conditions, with the between-group differences accounting for around 14% and 23% of the respective variances. In the current study, however, the deceptive condition induced only slightly higher levels of self-reported emotional activation than the naïve truth and the informed truth conditions, with the between-group differences accounting for around 9% and 1%

136

Chapter 6: Re-investigating the executive demands of deception ______of the respective variances. These results are similar to those observed on the supplementary self-report measure. A post hoc inspection of the 95% confidence intervals for the effect sizes indicates that this discrepancy can be explained by an increase in the mean levels of self-reported emotional activation for the naïve truth (study 1 M = 4.49, 95% CI [4.62, 5.27]; study 2 M = 5.59, 95% CI [5.33, 5.85]) and informed truth (study 1 M = 4.73, 95% CI [4.40, 5.06]; study 2 M = 5.85, 95% CI [5.65, 6.06]) conditions. The mean levels for the deceptive conditions were approximately the same across the studies. It may be the case that the participant-senders felt less confident in their ability to justify their opinions when they were faced with the probe questions, resulting in heightened levels of emotional activation. When the task of producing a false opinion is evaluated against the background of this heightened arousal, however, it may have been viewed as relatively benign. Another important effect of the probe questions was that the veracity judgements collected in the current study demonstrated a higher level of intra-sender consistency than those collected in the previous study. While the levels were still below the 0.7 level suggested by Nunnally and Bernstein (1994), it is important to point out that this recommendation applies to items designed to measure the same construct. As participant-senders in the current study only provided one message per condition, the precise level of intra-sender consistency cannot be determined. The results do indicate, however, that participant-senders who were perceived as truthful in one condition tended to be perceived as truthful in the other conditions as well. This finding suggests that the addition of the probe questions increased the amount of information that was available in the recordings and thus decreased the amount of measurement error associated with the veracity judgements. While the previous study provided ambiguous evidence regarding the relationship between executive and deceptive ability measures, the current study’s results support the idea that the executive demands of deception are causally relevant to the diagnosticity of behavioural displays. Specifically, the results from the series of SEMs predicting the naïve detectability scores showed that when inhibitory control and set shifting were modelled in isolation, their coefficients were intermediate to large in size. When they were modelled in concert with working memory updating, however, the significance and magnitude of their effects were usurped by the coefficient for working memory updating. This suggests that individual differences in inhibitory control and set

137

Chapter 6: Re-investigating the executive demands of deception ______shifting do not contribute much to the prediction of the naïve detectability scores once individual differences in working memory updating have been taken into account. Working memory updating, on the other hand, always predicted the naïve detectability scores, irrespective of whether it was modelled in isolation or in concert with set shifting and inhibitory control (those with poor working memory updating skills tended to be more detectable in the false opinion task). This conclusion is strengthened by the fact that the full three-path model had a significant coefficient for working memory updating, but much lower and non-significant coefficients for inhibitory control and set shifting. Furthermore, it does not appear as though the paths for inhibitory control and set shifting were competing for shared variance in the full three-path model as when either coefficient was constrained the other coefficient remained non-significant. The results from the series of SEMs predicting the informed detectability scores were similar to those of the SEMs predicting the naïve detectability scores, with individual differences in working memory abilities accounting for the majority of the predictive relationship (poor working memory skills were associated with higher sender detectability scores). Furthermore, individual differences in working memory abilities explained approximately the same amount of variance across both sets of analyses. This suggests that while people find controlling their expressive behaviours to be cognitively demanding, as indicated by the self-report measures, the demands are either not related to the executive system or are not sufficiently taxing of the executive system to influence the diagnosticity of behavioural displays. The demands associated with message production, on the other hand, do appear to be sufficiently taxing of the executive system to influence the diagnosticity of behavioural displays. In the context of the false opinion paradigm, stating an opinion likely involves the activation of information in long-term memory and its subsequent transfer to working memory where it is organized into a coherent narrative (Ericsson & Kintsch, 1995; Kintsch, 1998). As previous research has demonstrated that people have a tendency to seek out information that confirms their existing beliefs rather than information that might disconfirm them – known as confirmation biases (Klayman & Ha, 1987; Nickerson, 1998) – it follows that the amount of opinion-consistent information that is held in working memory would be greater than the amount of opinion-inconsistent information. When the message is truthful, this opinion-consistent information may be conveyed in the message and then removed from working memory,

138

Chapter 6: Re-investigating the executive demands of deception ______allowing other information to take its place in an ongoing sequence of activation, transfer, organization, and output. When the message is deceptive, however, the opinion-consistent information cannot be conveyed in the message, thus it must be held in working memory and suppressed. As it is likely that the opinion-consistent information constitutes the majority of the activated information, fewer working memory resources would be available to organize the opinion-inconsistent information into a coherent narrative. The results from the current study suggest that those with poor working memory abilities are more affected by this process and manifest more observable signs of cognitive strain. These cognitive cues are then witnessed by receivers and correctly associated with deception. The results of the current study are consistent with the neuroimaging results reported by Christ et al. (2009). In their study, they demonstrated that deception related activity showed isolated overlap with regions of the brain implicated in working memory, but not with the regions associated with inhibitory control or set shifting. It is important to point out that the results from the current study do not provide evidence regarding the nature of the functional relationship between greater neural activation and deception performance. From an individual differences perspective, it may be the case that greater activation in brain regions associated with working memory reflects a system in overload, with greater activation associated with poor deception performance. Alternatively, it may be the case that greater activation in brain regions associated with working memory reflects a more efficient system, with greater activation associated with better deception performance. While further research may shed light on the functional contribution of the brain to the behavioural correlates of deception, at present the nature of this functional contribution remains speculative. While the results of the current study support the idea that the estimated relationships observed in the previous study were attenuated by excessive measurement error, and that when this error is reduced individual differences in executive abilities, particularly working memory abilities, predict individual differences in sender detectability, it is important to point out that the findings of the current study are correlational in nature. This means that while they may be consistent with a causal explanation, they do not directly demonstrate that changes in working memory abilities are associated with changes in sender detectability. The next chapter in this thesis presents an empirical study that uses a resource depletion framework to experimentally

139

Chapter 6: Re-investigating the executive demands of deception ______manipulate executive abilities before a deception task, thus providing evidence regarding whether the relationships observed in the current study are causal in nature.

140

Chapter 7: Impairing deception performance by depleting working memory ______Chapter 7 Study 3 – Impairing Deception Performance by Depleting Working Memory The previous study demonstrated that the executive demands of deception are causally relevant to the diagnosticity of behavioural cues. More specifically, the previous study demonstrated that while the demands placed on inhibitory control and set shifting resources are insufficient to influence the incidence and/or intensity of diagnostic behavioural displays, and thus receiver perceptions of credibility, the demands placed on working memory resources are sufficient to influence the diagnosticity of behavioural displays, with poor working memory abilities associated with higher sender detectability scores. While this finding is intriguing, its correlational nature does not provide sufficient evidence to advance causal conclusions, thus the aim of the study presented in this chapter is to investigate whether manipulating the availability of working memory resources influences the diagnosticity of behavioural displays, and thus receiver perceptions of credibility. While the literature pertaining to the cognition of deception has emphasized executive control processes as crucial to the execution of a deceptive message, the closely related concept of self-control has received relatively little attention by deception researchers. Baumeister, Vohs, and Tice (2007) defined self-control as “the capacity for altering one’s own responses, especially to bring them into line with standards such as ideals, values, morals, and social expectations, and to support the pursuit of long-term goals” (p. 351). This definition places self-control as a core aspect of adaptive human behaviour, thus its relative neglect in deception research is surprising. Traditionally, the concepts of self-control and executive control have been studied from somewhat different perspectives, with social and personality researchers studying the former and cognitive researchers studying the later (Hofmann, Schmeichel, & Baddeley, 2012). Over the past two decades, however, the similarities between the two concepts has garnered attention, with cognitive concepts such as working memory and inhibitory control being integrated into contemporary models of self-control (Engle, Kane, & Tuholaski, 1999; Logan, Schachar, & Tannock, 1997). This is important because a particular feature of self-control, namely its energy/strength like quality, has recently been shown to be applicable to executive control processes as well (Schmeichel, 2007).

141

Chapter 7: Impairing deception performance by depleting working memory ______According to the energy/strength model of self-control, the ability to alter one’s own responses is resource limited and subject to fatigue. The most common analogy equates the capacity for self-control to musculature strength. In brief, when muscles undergo strenuous activity they temporarily fatigue. This fatigue noticeably impairs subsequent attempts at strength expenditure. Accordingly, just as a muscle gets tired after use, so too does the capacity for self-control. Indeed, the self-control literature is replete with studies demonstrating that the capacity for self-control is impaired after prior expenditure. This temporary impairment in self-control has been demonstrated under a variety conditions, such as cognitive load (Friese, Hofmann, & Wänke, 2008; Hofmann, Gschwendner, Castelli, & Schmitt, 2008; Ward & Mann, 2000), ego depletion (Hagger, Wood, Stiff, & Chatzisarantis, 2010; Vohs & Heatherton, 2000), environmental or social stressors (Inzlicht, McKay, & Aronson, 2006; Finkel et al., 2006;) alcohol intoxication (Hofmann & Friese, 2008), stereotype threat and other high- stakes situations (Beilock & Carr, 2005; Johns, Inzlicht, & Schmader, 2008), mortality salience (Friese & Hofmann, 2008; Gailliot, Schmeichel, & Baumesister, 2006) and interracial interaction (Richeson, & Shelton, 2003). Much of this research has been interpreted through what is known as a resource depletion framework (Baumeister, Vohs, & Tice, 2007; Hofmann, Schmeichel, & Baddeley, 2012). While the vast majority of previous research in this area has focused on the concept of self-control, recent evidence suggests that executive control processes also exhibit depletion like effects. Indeed, more contemporary theories of self-control posit state reductions in executive control processes as the underlying conceptual mechanism driving resource depletion effects (Hofmann, Schmeichel, & Baddeley, 2012). In one of the first studies to observe these effects on measures of executive control, Van der Linden, Frese, and Meijman (2003) investigated whether mentally fatigued people demonstrate deficits in cognitive flexibility and planning – two concepts commonly recognized as executive control processes. They argued that a deficit in cognitive flexibility would manifest behaviourally as a tendency to perseverate on ineffective strategies whereas a deficit in planning would manifest behaviourally as a tendency to initiate actions without considering a strategy beforehand, by ineffective plans, or by increased planning time. To test their predictions, they assigned participants to either a fatigue condition or a non-fatigue condition. Participants in the fatigue condition had to work on a mentally taxing scheduling task for 2 hours whereas those in the non-fatigue 142

Chapter 7: Impairing deception performance by depleting working memory ______group had to bridge the 2 hours without engaging in any cognitively demanding activities. Once the 2 hours had elapsed, all participants completed a forward digit span task, the Tower of London task, and the WCST. The results indicated that the participants in the fatigued condition had significantly higher percentages of perseverative errors and discovered less sorting rules on the WCST compared to those in the non-fatigued condition. The participants in the fatigued condition also took significantly longer to initiate their first move on the Tower of London task compared to those in the non-fatigued condition. In contrast to the executive measures, no significant differences were observed on the forward digit span test, suggesting that the processes associated with working memory maintenance (a non-executive component of working memory) were unaffected by the scheduling task. This study is important because it is one of the first to demonstrate a specific effect of mental fatigue on behavioural measures of executive control. While the results of Van der Linden et al. (2003) are consistent with the hypothesis that executive control processes are subject to resource depletion effects and that non-executive processes (those associated with working memory maintenance) may be resistant to such effects, the results do not identify the extent of the aftereffects (how long they last) or precisely which distinct executive control processes may be impaired and which may be unimpaired. Furthermore, the mentally taxing scheduling task was fairly non-specific, meaning that the cognitive components of the scheduling task are not well understood, thus the nature of the prior expenditure is unclear. In a more precise application of the resource depletion framework, Schmeichel (2007) investigated whether completing an executive control task at time one impaired performance on another executive control task at time two. In the first experiment, participants were assigned to either a depleted condition or a control condition. Both groups watched a short television presentation which had distracting information at the bottom of the screen. Participants in the depleted condition were instructed to ignore the distracting information while those in the control condition were given no instructions. The author argued that trying to ignore the distracting information would consume attentional resources. After watching the short television presentation, half the participants completed an operation span task while the other half completed a sentence span task. Both of these tasks were considered to measure working memory. The results

143

Chapter 7: Impairing deception performance by depleting working memory ______indicated that the participants in the depleted condition performed significantly worse on both of the working memory tasks than those in the control condition. To replicate and extend this finding to other executive control tasks, Schmeichel (2007) conducted a second experiment where the participants in the depleted and control conditions were required to write a short story. Participants in the depleted condition were instructed to write their story without using the letters a or n whereas those in the control condition were given no instructions. The author argued that writing a story without the using the letters a or n would consume inhibitory control resources. After writing their stories, all participants completed a forward and reverse digit span task (tests of working memory maintenance and working memory updating, respectively). The results indicated that the participants in the depleted condition recalled fewer digits on the reverse digit span task than those in the control condition. On the forward digit span task, however, the participants in the depleted condition recalled approximately the same number of digits than those in the control condition. The author argued that the results from the first two experiments supported the idea that while executive control processes were subject to resource depletion effects, non- executive processes (those associated with working memory maintenance) were resistant to such effects. In a third experiment, Schmeichel (2007) investigated whether completing a working memory updating task at time one produced a different aftereffect at time two compared to a working memory maintenance task. Participants were assigned to one of three conditions; an easy short-term memory condition (STM-2), a hard short-term memory condition (STM-6), or an executive condition (OSPAN). While participants in the executive condition completed a slightly longer version of the operation span task that was used in the first experiment (evaluate mathematical equations while encoding target words for later recall), participants in the short-term memory conditions completed one of two memory tasks. These tasks contained the same mathematical equations and target words as used in the operation span task; however the equation and word tasks were not intermixed (the equation evaluation task preceded the word task). Specifically, in the STM-2 condition, participants encoded two target words and then attempted to recall them immediately thereafter whereas in the STM-6 condition participants encoded six target words before attempting to recall them. The two short term memory tasks were used to test an alternate account of the results observed in the 144

Chapter 7: Impairing deception performance by depleting working memory ______previous studies - namely that performing any effortful or difficult task impairs subsequent executive performance. After they had completed their respective memory tests, all participants were recorded as they watched an emotionally charged film clip while under instructions to hide their emotional reactions. Three independent judges then viewed the recordings and rated how emotionally expressive each of the participant’s faces were. The results indicated that the participants in the depleted condition were perceived to have expressed more emotion on their faces than those in the STM-2 and STM-6 conditions. Furthermore, the participants in the STM-2 condition were perceived to have expressed approximately the same level of emotion on their faces than those in the STM-6 condition. The author argued that the results were consistent with predictions from an executive specific resource depletion framework. Importantly, a close inspection of the results from Schmeichel’s (2007) third experiment suggests that the findings may not be as straightforward as the author proposed. Specifically, while the mean level of emotional expressivity was higher in the OSPAN condition than in the STM-2 and STM-6 conditions, the statistical tests used to assess the between-group differences (one-way ANOVA with follow-up contrasts) assume equal population variances. This assumption may be untenable given that the reported standard deviation for the OSPAN condition was more than three times the size of that for the STM-2 and STM-6 conditions. This is an important consideration as it suggests that the effect of prior expenditure may not have affected all the participants equally. That is, an increase in the standard deviation suggests that certain people may have been particularly susceptible to resource depletion while others may have been relatively resistant (increasing the variance in OSPAN performance). When interpreted through a central capacity theory framework these results make sense, as reducing the amount of available resources would only have an impact on task performance if the task demands exceeded the amount of remaining resources. It may have been the case that participants with superior executive resources still had enough remaining resources to perform the subsequent task with little impairment (task performance was data- limited). Among those with poor executive abilities, however, the task at time one may have consumed enough of their executive resources such that the demands of the task at time two exceeded their remaining resources, resulting in less than perfect task performance (task performance was resource-limited). Depending on the proportion of people who were susceptible to the depletion effect, task performance may have been 145

Chapter 7: Impairing deception performance by depleting working memory ______linearly (most people were susceptible to the effect) or quadratically (only some people were susceptible to the effect) related to measures of executive abilities. Taken together, the results from the series of experiments conducted by Schmeichel (2007) are consistent with the idea that executive control processes draw from a limited pool of consumable resources. Furthermore, the results are consistent with the idea that when the pool of executive resources has been partly depleted by a prior task, performance on subsequent tasks that draw from the same pool of resources tends to be impaired. Finally, the results are consistent with the idea that different forms of executive control draw from a common pool of consumable resources and that engaging in any executively demanding task will impair subsequent executive performance, but not attention and memory more generally. With regard to the executive demands of deception, a resource depletion framework predicts that deception performance may be impaired if the deceptive act is preceded by a sufficiently demanding executive task. To date only one study has applied a resource depletion framework in an attempt to manipulate executive control processes both before (ego depletion) and during (goal neglect) a lie detection test. In the first of two studies, Debey, Verschuere, and Crombez (2012) assigned participants to either an ego depletion or a control condition and had them complete an e-hunting task. In the first part of the e-hunting task, all participants were instructed to cross out instances of the letter e on a page of text. In the second part of the task, participants in the control condition continued to cross out instance of the letter e whereas participants in the ego depletion condition were no longer allowed to cross out instances of the letter e that were adjacent or two letters away from another vowel. The authors argued that refraining from crossing out these instances of e would consume inhibitory control resources. After completing the e-hunting task, all participants completed an autobiographical questionnaire consisting of 36 common daily activities. Participants made categorical yes/no responses based on whether or not they had performed an activity on the day of the experiment. Their responses to this questionnaire served as the ground truth in a subsequent Sheffield lie test. In the Sheffield lie test, each participant’s responses from the autobiographic questionnaire were colour coded and presented on a computer screen. Importantly, participants were instructed to respond truthfully or deceptively depending on the colour. All questions were presented four times; once in each colour (truth vs. lie) and 146

Chapter 7: Impairing deception performance by depleting working memory ______once per response-stimulus interval (RSI; 200ms vs. 5000ms). The authors argued that 200ms RSI trials would promote attentional focus whereas 5000ms RSI trials would debilitate attention, resulting in a state of goal neglect. The results indicated that the participants made more errors and had longer reaction times when lying than when telling the truth and that the 5000ms RSI trials produced larger differences in error rates and reaction times than the 200ms RSI trials. No differences were observed across the ego depletion and control conditions. The authors argued that while goal neglect impairs deception performance, ego depletion does not appear to affect deception performance. To replicate and extend their earlier findings, Debey et al. (2012) conducted a second study where, rather than induce ego depletion via the e-hunting task, participants completed a verbal stroop task before the Sheffield lie test. The results were similar to those from the first study, with no differences observed across the depletion and control conditions. The authors argued that the null findings may have been due to a lack of cognitive complexity among the tasks used to induce ego depletion. Indeed, no differences between the two conditions were observed on measures of self-reported effort and fatigue. It may also be the case that the effect of the tasks at time one had worn off by the time participants completed the task at time two. After completing the e-hunting task in the first experiment, participants completed three questionnaires requiring a total of 73 responses. After completing the verbal stroop task in the second experiment, participants performed 10 simple activities and then completed a questionnaire in which they had to indicate which activities they performed. They also completed two of the questionnaires from the first experiment, requiring a total of 37 responses. The delay between the depletion task at time one and the deception task at time two may have long enough that participants were no longer in state of ego depletion by the time they completed the Sheffield lie test. A resource depletion framework provides a mechanism whereby the availability of executive resources can be experimentally manipulated. In particular, the operation span task used in the third experiment conducted by Schmeichel (2007) was shown to influence the degree to which participants could control their expressive behaviours. As the previous study demonstrated that the executive demands of deception are sufficient to influence the diagnosticity of behavioural cues (deception performance is resource- limited when measured at the level of perceptions), it follows that the operation span

147

Chapter 7: Impairing deception performance by depleting working memory ______task should also influence performance in a deception task. This is the focus of the current study.

General Overview and Research Questions Like the previous studies reported in this thesis, the current three part study uses two distinct samples of participants; a sample of participant-senders and a sample of participant-receivers. In part one, participant-senders were video and audio recorded while they produced four messages regarding controversial socio/political issues. The first two of these messages were truthful whereas the last two were deceptive. After participant-senders had recorded their first truthful message, they completed a short buffer task (listening to music for 10 minutes), after which they completed one of the three cognitive tasks (STM-2, STM-6, or OSPAN) used in the third experiment conducted by Schmeichel (2007). Participant-senders then provided another truthful message immediately after they had completed the cognitive task. They were then debriefed, completed another short buffer task, and then provided their first deceptive message. Participant-senders then completed their respective cognitive task again, after which they immediately provided another deceptive message. The truthful messages produced in the current study were naïve truths (those where the participant-senders were not aware that their messages would be later evaluated for veracity). This was because the results from the previous studies reported in this thesis indicated that cueing participant-senders to their credibility increases self- reported levels of emotional activation, cognitive load, and behavioural control. While in certain situations these higher levels of emotional activation, cognitive load, and behavioural control may mirror the internal states of truth-tellers (when they are not taking their credibility for granted, for instance), the deceptive messages produced in the current study are considered to be low stake lies (there are no consequences for being disbelieved and only minor rewards for being believed). As such, it was considered appropriate that the comparison truthful messages should be those where participant- senders are taking their credibility for granted and are not particularly concerned with making a credible impression (naïve truths). In part two, participant-senders returned approximately one week after they had completed part one to complete three measures of working memory. In part three, an independent sample of participant-receivers evaluated the participant-sender messages 148

Chapter 7: Impairing deception performance by depleting working memory ______produced in part one of the study by making dichotomous judgements regarding perceived veracity (truth vs. lie).

Research question 1: The effect of task type. The first research question pertains to the effects of the three different cognitive tasks. Specifically, if the executive demands of deception are casually relevant to the diagnosticity of behavioural cues, we would expect the accuracy rates for the deceptive messages produced after the OSPAN task to be higher than the accuracy rates for the deceptive messages produced before the OSPAN task. This is because the deceptive messages produced before the OSPAN task will draw from a larger pool of executive resources than the deceptive messages produced after the OSPAN task, thus participant-senders should show more signs of cognitive strain when producing the deceptive messages that come after the OSPAN task than those that come before the task. Furthermore, if such an effect is caused by the amount of cognitive demand imposed by the task, rather than the type of demand, we would expect the magnitude of the effects to be a function of task difficulty (STM-2 < STM-6 < OSPAN) rather than of task type (STM-2 = STM-6 < OSPAN). Finally, if only the deceptive messages tax the executive system, we would not expect any effects to be observed among the truthful messages.

Research question 2: The moderating role of working memory. The second research question pertains to the moderating role of working memory abilities. Specifically, if the OSPAN task impairs subsequent deception performance in only those with poor working memory abilities, we would expect the amount of impairment to be quadratically related to measures of working memory. This is because the deception performances of those with poor working memory abilities may already be resource-limited under normal conditions (when measured at the level of receiver perceptions of credibility; see study 1 and 2), thus a decrease in relevant resources would cause a decrease in deception performance. The deception performances of those with good working memory abilities, however, may not be resource-limited under normal conditions, thus a decrease in relevant resources may not necessarily cause a decrease in deception performance (provided those with good working memory abilities still have enough resources to complete the task without observable signs of cognitive strain). 149

Chapter 7: Impairing deception performance by depleting working memory ______Method Participant-Senders One hundred and fourteen undergraduate psychology students from the University of New South Wales served as participant-senders and received partial course credit for their participation. Of the 114 participant-senders, 75 (65.79%) identified themselves as female and 39 (34.21%) as male. Their mean age was 20.78 (range 18 - 35, SD = 2.02). Participant-senders completed part two of the study approximately one week after they had completed part one. Both parts took approximately 2.5 hours (total) to complete.

Participant-Receivers Seven hundred and ninety eight workers from Mechanical Turk served as participant-receivers and received US $1 for their participation. All of the participant- receivers indicated that they resided in the United States. Of the 798 participant- receivers, 486 (60.90%) identified themselves as female and 312 (39.10%) as male. The mean age was 32.84 (range 18 - 75, SD = 11.70). Of the sample, 630 (78.95%) indicated that they most strongly identify as Caucasian, 95 (11.90%) as African American, 31 (3.88%) as Hispanic and 42 (5.27%) as other ethnic or cultural backgrounds. The majority of the sample indicated that they were native English speakers (n = 758, 94.99%), with the 40 (5.01%) non-native English speakers indicating that they had been speaking English for an average of 12.71 years (range 10 - 35, SD = 4.94). Participant-receivers completed part three of the study, which took approximately 15 minutes to complete.

Materials and Procedure: Part One False opinion task. Participant-senders completed part one of the study. At the outset, they were misleadingly recruited to a study investigating the effect of music on opinion formation. Upon arrival, they were provided with a misleading consent form informing them that the purpose of the study was to investigate whether different types of music influence people’s opinions. They were told that they would first complete a short paper and pencil opinion survey which would be followed by several short recorded interviews where they would be required to provide a verbal justification regarding some of their responses on the survey. Importantly, participant-senders were 150

Chapter 7: Impairing deception performance by depleting working memory ______told that the recordings were for transcription purposes only and that only the researchers involved in the study would be working with the materials. They were told that the purpose of the transcriptions was to convert their verbal account to a quantifiable data format. This deception was necessary to justify the recording of the interviews while not cueing participant-senders to their credibility. After consenting to the misleading procedure, participant-senders were provided with a paper and pencil opinion survey. The survey was the same as that used in the previous study (study 2) and consisted of 20 controversial socio/political opinion statements. The complete opinion survey is presented in Appendix D. Participant- senders were asked to carefully read the survey and to indicate their opinion regarding each item on the scale provided. The scale anchors were: (1) strongly disagree, (2) disagree, (3) neutral, (4) agree, and (5) strongly agree. Once participant-senders completed the opinion survey, the researcher entered all the strongly endorsed opinion items (items where the participant-sender selected either the ‘strongly agree’ or ‘strongly disagree’ response) into an online number randomizer. Five strongly endorsed opinion items were then randomly selected to be the focus of the subsequent interviews. If the participant-sender did not strongly endorse five opinion items, the endorsed opinion items (items where the participant-sender selected either the ‘agree’ or ‘disagree’ response) were entered into the online number randomizer. The endorsed opinion items were then selected at random until there was a total of five opinion items (all the strongly endorsed items plus the randomly selected endorsed items). The strongly endorsed and randomly selected endorsed items were then randomized again so that the items would be randomly distributed over the truthful and deceptive conditions. After completing the opinion survey, participant-senders then moved onto the interview section of the false opinion task. Like the previous study, each interview began with the researcher asking the participant-senders “do you believe that [Australia should build more nuclear power plants]”. The text in brackets changed depending on which opinion item was the focus of the interview. The researcher then followed up the participant-senders initial responses with the same six probe questions that were used in the previous study. Once participant-senders had completed the opinion survey, they completed a practice interview. They then completed the first recorded interview where they were 151

Chapter 7: Impairing deception performance by depleting working memory ______given no further instructions on how to answer (truthful before). After the first interview, participant-senders were provided with a pair of headphones and told that they would now listen to randomly selected music for 10 minutes. While participant- senders were told that the music they were going to listen to was randomly selected, in fact all participant-senders listened to peace, mediation, and relaxation music. This rest period was important to the design of the study as it served as a buffer task between interviews and gave participant-senders’ executive resources time to recover from the demands of the preceding interview (see Tyler & Burns, 2008). To ensure that participant-senders did not engage in any executively demanding tasks during the rest period, they were told that for the experiment to work it is important that they do nothing but listen to the music during the 10 minute period. After the rest period, participant-senders completed one of three cognitive tasks. They either completed the STM-2 task, the STM-6 task, or the OSPAN task. These tasks were identical to those used in the third experiment conducted by Schmeichel (2007). Immediately after completing their respective cognitive tasks, participant- senders completed the Brief Mood Introspection Scale (BMIS; Mayer & Gaschke, 1988). The BMIS was administered to investigate whether the three cognitive tasks differentially influenced participant-senders mood. Participant-senders also rated the difficulty of their respective cognitive task on a 7-point Likert scale. The scale anchors ranged from 1 (not at all difficult) to 7 (very difficult). Participant-senders then completed the second truthful interview (truthful after), after which they were debriefed as to the true aims of the experiment (measuring the ability to deceive) and asked to re- consent to the study. Participants were not told the purpose of the cognitive tasks. After re-consenting to the study, participant-senders completed a second 10- minute recovery period, after which they provided their first deceptive message (deceptive before). This message required participant-senders to argue in favour of an opinion-inconsistent position (opposite to what they had indicated on the opinion survey). After they provided their first deceptive message, participant-senders then completed a third 10-minute recovery period, after which they once again completed their respective cognitive task. They then immediately completed a second BMIS and difficulty rating, and then provided their second deceptive message (deceptive after). The entire procedure took approximately two hours to complete.

152

Chapter 7: Impairing deception performance by depleting working memory ______Cognitive tasks. The precise details of the STM-2, STM-6, and OSPAN tasks used in part one of the study are described below. Operation span task (OSPAN). The OSPAN task required participant-senders to evaluate math equations while encoding target words for later recall. Specifically, participant-senders were presented with a math equation (e.g., 8 x 3 = 21) and had to indicate (yes vs. no) whether the given answer was correct. They were then presented with a target word (e.g., house) for later recall. One target word was presented after each equation. Participant-senders were presented with two, three, four, and five equation-word pairings before being prompted to recall the target words in each set. They were not told how many words a set would include beforehand. There were a total of 16 sets (56 equation-word pairings), presented in the same order for each participant- sender. The number of words correctly recalled was recorded. Memory maintenance tasks (STM-2 and STM-6). In both versions of the memory maintenance tasks, participant-senders evaluated the same mathematical equations and encoded/recalled the same target words as used in the operation span task, however these tasks were not intermixed. In the short-term memory-2 task (STM-2), participant-senders were presented with two target words before being prompted to recall them whereas in the short-term memory-6 task (STM-6) participant-senders were presented with six target words before being prompted to recall them. While the STM-2 task consisted of 28 sets, the STM-6 task consisted of 9 sets, with the 10th set containing only two words. This procedure ensured that each participant-sender attempted to encode/recall the same 56 target words. Immediately after completing the word recall phase of the task, participant-senders evaluated the same 56 mathematical equations that were used in the operation span task. Recall performance was recorded. Brief Mood Introspection Scale (BMIS). The BMIS is a mood adjective scale with an item sample of 16 adjectives, 2 selected from each of 8 mood states (Mayer & Gaschke, 1988). Participant-senders were presented with the 16 adjectives and asked to indicate how well each adjective or phrase described their present mood on a 7-point Likert scale (see Appendix F). The scale anchors were: 1 (definitely do not feel), 3 (do not feel), 5 (slightly feel), and 7 (definitely feel). While adjective scores can be used to construct up to four sub-scales, the current study only used the pleasant-unpleasant scale as it draws from each of the 16 adjectives and was considered to be the most relevant.

153

Chapter 7: Impairing deception performance by depleting working memory ______Material and Procedure: Part Two Battery of working memory measures. The participant-senders that completed part one of the study retuned approximately one week later to complete part two. In part two, participant-senders completed three tasks designed to measure working memory updating; an n-back task (Jaeggi et al., 2003; Jaeggi et al., 2010; Owen et al., 2005), a running memory span task (Bunting et al., 2006; Broadway & Engle, 2010), and a counting span task (Conway et al., 2005). While the n-back and running memory span task were the same as those described in the previous chapter, the counting span task presented participant-senders with a series of cards with green and yellow dots on them. Participant-senders were required to count the number of green dots on each card. After a certain number of cards had been presented (set size), participant-senders were prompted to recall the number of green dots they counted on each card (starting with the first card and going in order). The task began with a set size of 1 and had a maximum set size of 10. Each set size was presented 2 times. The task was terminated if the participant-sender did not correctly complete at least one of the sets. The dependant measure was the highest set size where at least one of the two sets was correctly recalled.

Materials and Procedure: Part Three Online veracity judgement task. Participant-receivers completed part three of the study. The online veracity judgement task used in this study was similar to that described in the previous chapter, except that participant-receivers in this study evaluated 12 pseudo-randomly selected messages (rather than five). Furthermore, whereas the online veracity judgement tasks used in both of the previous studies (study 1 and 2) did not have any constraints regarding which conditions the messages were selected from (the tasks contained a combination of messages selected from the naïve truth, informed truth, and/or the deceptive conditions), the veracity judgement task in this study used a 2 x 3 mixed factorial design, with one within-subject factor (veracity; truth vs. lie) and one between-subject factor (condition; STM-2 vs. STM-6 vs. OSPAN). This ensured that each participant-receiver evaluated six truthful messages and six deceptive messages and that the 12 messages came from the same condition. This design allows receiver detection ability measures to be calculated and adjusted for in statistical analyses, a feature that is discussed in more detail in the next chapter. 154

Chapter 7: Impairing deception performance by depleting working memory ______In the online veracity judgement task, participant-receivers were first presented with an online consent form and instructions explaining that they were about to view 12 recorded interviews of people stating various opinions. They were told that while some of these people would be stating their true opinions, others would be stating false opinions. They were asked to evaluate each interviewee’s verbal and nonverbal behaviours and to indicate whether they thought each interviewee was being truthful or deceptive via a dichotomous veracity judgment (truth vs. lie). After reading the instructions and providing their consent, participant-receivers were presented with an audio validation screen. This screen consisted of a blank text entry field and an audio recording explaining that in order to proceed they must enter the validation word ‘CAT’ into the text entry field. This ensured that all participant- receivers had functioning audio systems and were able to hear the participant-sender messages. Once they entered the correct validation word, participant-receivers were pseudo-randomly presented with 12 participant-sender messages. The pseudo-random assignment ensured that no participant-sender would appear more than once per sequence, thus eliminating the possibility that the participant-senders would contradict themselves across their messages and that the participant-receivers could use this information to inform their veracity judgements. Each trial began with the presentation of a participant-sender message, during which time all the participant-receiver controls were locked. Once each participant-sender message had finished playing, the task automatically proceeded to the next screen (the response screen) where participant- receivers could indicate whether they thought the preceding participant-sender message was truthful or deceptive via a forced choice response (truth vs. lie). After each participant-receiver had completed all 12 trials, a ‘catch trial’ was presented. This consisted of a video-audio recording of the researcher stating that “in order to ensure data integrity, please select the lie response on the next screen”. This trial served to identify participant-receivers who were not paying close attention or randomly responding (Paolacci et al., 2010). No participant-receivers failed the catch trial. The online veracity judgment task took approximately 45 minutes to complete.

155

Chapter 7: Impairing deception performance by depleting working memory ______Results Manipulation Checks Task performance. To assess whether the objective difficulty of the three cognitive tasks increased as expected (OSPAN > STM-6 > STM-2) and whether the objective difficulty of the three cognitive tasks changed over the course of the experiment, a 2 x 2 mixed ANOVA was conducted with task type (STM-6 vs. OSPAN) entered as a between-subjects factor, task administration (first vs. second) entered as a within-subjects factor, and the number of target words correctly recalled on the cognitive tasks entered as the dependant variable. The STM-2 task was not included in the analysis as all but one of the participant-senders that completed this task correctly recalled all of the target words (task performance was at ceiling). This result is not surprising as the task simply involved recalling two serially presented words. This lack of performance variability on the STM-2 task indicates that the task was not sufficiently demanding to measure individual differences in working memory abilities (the task was not resource-limited). As the questions of interest pertained to the single main effect of task type and the two simple effects of task administration within each level of task type, the three planned contrasts were tested against a Šidák adjusted alpha level of .017. The mean scores for the three tasks and their respective 95% confidence intervals are presented in Figure 12.

6 0 F irs t a d m in is tra tio n

d S e c o n d a d m in is tra tio n

s

e

l

d

l

r

a o

c 4 0

e

W

R

f

o

y

l

r

t

e

c

b e

r 2 0

m

r

u

o

N C

0 S T M - 2 S T M - 6 O S P A N

F ig u r e 1 2 . M e a n n u m b e r o f w o rd s c o rre c tly re c a lle d b y c o n d itio n a n d ta s k a d m in is tra tio n . E rro r b a rs re p re s e n t 9 5 % c o n f id e n c e in te rv a ls .

156

Chapter 7: Impairing deception performance by depleting working memory ______The analyses indicated that the mean difference between the STM-6 and OSPAN conditions (averaged over the levels of task administration) was statistically 2 significant, F(1, 74) = 29.04, MSE = 94.97, p < .005, η p = .28, 90% CI [.14, .40], with participant-senders in the STM-6 condition (M = 36.13, SD = 9.81, 95% CI [32.98, 39.28]) correctly recalling an average of 12.05 more words (95% CI [7.59, 16.50]) than those in the OSPAN condition (M = 24.08, SD = 11.15, 95% CI [20.93, 27.23]). This suggests that the OSPAN task was objectively more difficult than the STM-6 task. The analyses also indicated that while the simple effect of task administration was not statistically significant within the STM-6 condition, F(1, 74) = 0.02, MSE = 59.86, p = 2 .90, η p = .00, 90% CI [.00, .01], it was statistically significant within the OSPAN 2 condition, F(1, 74) = 6.31, MSE = 59.86, p = .01, η p = .08, 90% CI [.01, .19]. Indeed, on the first administration of the OSPAN task, participant-senders correctly recalled an average of 3.15 more words (95% CI [0.65, 5.65]) than they did on the second administration of the task. This suggests that the participant-senders in the OSPAN condition may have been in a state of executive depletion during the second administration of the OSPAN task while those in the STM-6 condition were not. Subjective task difficulty. To assess whether the subjective difficulty of the cognitive tasks increased as expected (OSPAN > STM-6 > STM-2) and whether the subjective difficulty of the different cognitive tasks changed over the course of the experiment, a 3 x 2 mixed ANOVA was conducted with task type (STM-2 vs. STM-6 vs. OSPAN) entered as a between-subjects factor, task administration (first vs. second) entered as a within-subject factor, and the self-reported difficulty ratings entered as the dependant variable. As the questions of interest pertained to the two main effects of task type and the three simple effects of task administration within each level of task type, the five planned contrasts were tested against a Šidák adjusted alpha level of .01. The mean scores and their respective 95% confidence intervals are presented in Figure 13.

157

Chapter 7: Impairing deception performance by depleting working memory ______

7 y

t F irs t a d m in is tra tio n

l u

c 6

i S e c o n d a d m in is tra tio n

f

f i

D 5

k s

a 4

T

f

o

s 3

g

n

i t

a 2 R

1 S T M - 2 S T M - 6 O S P A N

F ig u r e 1 3 . M e a n ra tin g s o f ta s k d iffic u lty b y c o n d itio n a n d ta s k a d m in is tra tio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

The analysis indicated that the mean difference between the STM-2 and STM-6 conditions (averaging over the levels of task administration) was statistically significant, 2 F(1, 111) = 22.39, MSE = 2.19, p < .005, η p = .17, 90% CI [.07, .27], with the ratings for the STM-2 condition (M = 2.26, SD = 1.24, 95% CI [1.79, 2.74]) an average of 1.61 units lower (95% CI [0.93, 2.28]) than those for the STM-6 condition (M = 3.87, SD = 1.73, 95% CI [3.39, 4.34]). This suggests that the STM-6 task was subjectively more difficult than the STM-2 task. Likewise, the mean difference between the STM-6 and OSPAN conditions (averaging over the levels of task administration) was also 2 statistically significant, F(1, 111) = 14.45, MSE = 2.19, p < .005, η p = .12, 90% CI [.04, .21], with the ratings for the STM-6 condition an average of 1.29 units lower (95% CI [0.62, 1.96]) than those for the OSPAN condition (M = 5.16, SD = 1.69, 95% CI [4.68, 5.63]). This suggests that the OSPAN task was subjectively more difficult than the STM-6 task. None of the simple effects of task administration were statistically significant. Indeed, the greatest difference was observed across the administrations of 2 the STM-6 task, F(1, 111) = 2.00, MSE = 021, p =.16, η p = .02, 90% CI [.00, .08], with the ratings for the second administration of the STM-6 task an average of 0.11 units higher (95% CI [-0.04, 0.25]) than those for the first administration. This suggests that on average the subjective difficulty of all three of the tasks remained constant across the two administrations.

158

Chapter 7: Impairing deception performance by depleting working memory ______BMIS scores. To assess whether the three cognitive tasks caused different changes in mood over the course of the experiment, a 3 x 2 mixed ANOVA was conducted with task type (STM-2 vs. STM-6 vs. OSPAN) entered as a between-subjects factor, task administration (first vs. second) entered as a within-subject factor, and the BMIS scores entered as the dependant variable. As the questions of interest pertained to the three simple effects of task administration within each level of task type, the three planned contrasts (first administration of STM-2 task vs. second administration of STM-2; first administration of STM-6 task vs. second administration of STM-6 task; first administration of OSPAN task vs. second administration of OSPAN task) were tested against a Šidák adjusted alpha level of .017. The mean scores and their respective 95% confidence intervals are presented in Figure 14.

5 0 F irs t a d m in is tra tio n S e c o n d a d m in is tra tio n

2 5

s

e

r

o

c S

0

S

I

M B - 2 5

- 5 0 S T M - 2 S T M - 6 O S P A N

F ig u r e 1 4 . M e a n B M IS s c o re s b y c o n d itio n a n d ta s k a d m in is tr a tio n . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

The analyses indicated that the simple effect of task administration within the OSPAN condition was statistically significant, F(1, 111) = 32.38, MSE = 17.41, p < 2 .0028, η p = .23, 90% CI [.12, .33], with the BMIS scores collected after the second administration of the OSPAN task an average of 5.45 units lower (95% CI [3.55, 7.34]) than those collected after the first administration of the task. This means that on average, participant-senders self-reported mood was lower after the second administration of the OSPAN task than after the first administration of the OSPAN task. By contrast, the simple effects of task administration within the STM-6 and STM-2 conditions were not statistically significant. Indeed, the greatest difference was observed 2 in the STM-6 condition, F(1, 111) = 1.89, MSE = 17.41, p = .17, η p = .02, 90% CI [.00, 159

Chapter 7: Impairing deception performance by depleting working memory ______.07], with the BMIS scores collected after the second administration of the STM-6 task an average of 1.31 units higher (95% CI [-0.58, 3.21]) than those collected after the first administration of the task. This means that on average, the self-reported mood of both groups of participant-senders was approximately the same after the second administration of the cognitive tasks as it was after the first administration of the cognitive tasks.

Research Question 1: The Effect of Task Type To assess whether deception performance is impaired following the prior expenditure of executive resources and whether the effect is specific to deceptive messages, a 3 x 2 x 2 mixed ANOVA was conducted with task type (STM-2 vs. STM-6 vs. OSPAN) entered as a between-subjects factor, message veracity (truthful vs. deceptive) and message setting (before task administration vs. after task administration) entered as within-subjects factors, and the by-sender aggregated accuracy scores entered as the dependant variable. As the question of interest pertained to the six simple effects of message setting within each level of message veracity and task type, the six planned contrasts were tested against a Šidák adjusted alpha level of .0085. The mean scores and their respective 95% confidence intervals are presented in Figure 15.

1 .0 B e fo re ta s k a d m in is tra tio n A fte r ta s k a d m in is tra tio n

s 0 .8

e

r

o c

S 0 .6

y

c

a r

u 0 .4

c

c A 0 .2 T ru th fu l M e s s a g e s D e c e p tiv e M e s s a g e s

0 .0 S T M - 2 S T M - 6 O S P A N S T M - 2 S T M - 6 O S P A N

F ig u r e 1 5 . M e a n a c c u ra c y s c o re s b y ta s k ty p e a n d m e s s a g e s e ttin g . E rro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

The analyses indicated that the simple effect of message setting among the deceptive messages in the OSPAN condition was statistically significant, F(1, 111) = 2 12.31, MSE = 0.01, p < .005, η p = .10, 90% CI [.03, .19], with the accuracy rates for the deceptive messages produced after the administration of the OSPAN task an average of 160

Chapter 7: Impairing deception performance by depleting working memory ______8.95% higher (95% CI [3.89, 14.00]) than those for the deceptive messages produced before the administration of the task. This means that on average, the deceptive messages produced after the administration of the OSPAN task were more detectable than the deceptive messages produced before the administration of the OSPAN task. Importantly, none of the other simple effects of message setting were statistically significant. Indeed, the next greatest difference was observed among the deceptive 2 messages in the STM-6 condition, F(1, 111) = 0.96, MSE = 0.01, p = .33, η p = .01, 90% CI [.00, .06], with the accuracy rates for the deceptive messages produced after the administration of the STM-6 task an average of 2.50% higher (95% CI [-2.55, 7.55]) than those for the deceptive messages produced before the administration of the task. This means that only the OSPAN task seemed to affect the detectability of the deceptive messages and that none of the tasks affected the detectability of the truthful messages.

Research Question 2: The Moderating Role of Working Memory To assess whether the amount of impairment observed in the OSPAN condition was moderated by working memory abilities, a multiple regression analysis was conducted. The amount of impairment was calculated by subtracting the correct identification rate for the deceptive messages produced before the administration of the OSPAN task from the correct identification rate for the deceptive messages produced after the administration of the task. This variable was then regressed onto the working memory scores. Before interpreting the regression model, however, preliminary analyses were conducted to assess the dispersion of the outcomes from the working memory measures and the assumptions underlying the regression procedure. To assess whether the working memory measures appropriately distributed individual differences across the outcomes, descriptive statistics, histograms and P-P plots were examined. While the distributions for the n-back task and the running span task appeared approximately normal in shape, the distribution for the counting span task appeared to contain a single outlier, with one participant-sender having a score 3.29 standard deviations from the mean. As the regression residuals appeared to be approximately normal in shape, however, the influence of the outlier was considered to be minimal. The descriptive statistics for the working memory scores are presented in Table 10.

161

Chapter 7: Impairing deception performance by depleting working memory ______Table 12 Descriptive Statistics for the Working Memory Scores Measure Mean SD Range Skewness Kurtosis N-back 3.66 1.28 1.20 – 5.99 -0.32 -0.60 Running-span 33.52 2.57 27.62 – 38.06 -0.62 -0.42 Counting-span 6.11 1.78 2 – 10 -1.05 1.87

While most of the assumptions underlying the regression procedure were acceptable, the assumption of linearity/co-linearity was suspicious. The zero-order correlations between the working memory scores and the amount of impairment observed in the OSPAN condition are presented in Table 11.

Table 13 Zero-Order Correlations Between the Working Memory Scores and the Amount of Impairment Observed in the OSPAN Condition Variable N-back Running-span Counting-span N-back ------Running-span .60** -- -- Counting-span .25 .34* -- Amount of impairment -.06 -.18 -.05 Note. * p < .05, two-tailed. ** p < .01, two-tailed.

An examination of Table 11 indicates that while the n-back and counting span tasks had little to no associations with the amount of impairment observed in the OSPAN condition, the running span task was weakly associated with the amount of impairment, with higher running span scores associated with slightly less impairment. This relationship, however, was not statistically significant. To assess whether the working memory scores, taken as a whole, predicted the amount of impairment observed in the OSPAN condition and whether the relationships were linear or quadratic in nature a hierarchal multiple regression analysis was conducted. The first block of predictors contained the working memory scores (Model 1) whereas the second block of predictors contained the squared working memory scores (Model 2)22. The results of the analysis are presented in Table 12. The analysis indicated that the amount of impairment observed in the OSPAN condition was not linearly or quadratically related to the working memory scores.

22 The squared working memory scores were used to assess whether the amount of impairment was quadratically related to the working memory scores. 162

Chapter 7: Impairing deception performance by depleting working memory ______Table 14 Summary of Multiple Regression Analysis 95% CI for β Variable B SE B β LL UL t p Model 1 N-back 0.01 0.03 0.08 -0.35 0.52 0.39 .70 Running-span -0.01 0.01 -0.24 -0.68 0.20 -1.09 .29 Counting-span 0.00 0.01 0.01 -0.35 0.38 0.06 .96 Model 2 N-back 0.15 0.13 1.26 -1.04 3.57 1.12 .27 Running-span 0.20 .230 3.38 -6.83 13.58 0.67 .51 Counting-span 0.04 0.05 0.62 -1.06 2.29 0.76 .45 N-back2 -0.02 .02 -1.21 -13.38 6.69 -1.14 .26 Running-span2 -0.00 .00 -3.62 -3.35 0.99 -0.73 .47 Counting-span2 -0.00 0.00 -0.72 -2.47 0.99 -0.87 .39 Note. R2 for Model 1 = .04, p = .72. R2 for Model 2 = .14, p = .56.

Post Hoc Analysis: The Moderating Role of Mood Changes While it appeared that the amount of deception impairment observed in the OSPAN condition was not moderated by individual differences in working memory abilities, the fact that the BMIS scores collected after the second administration of the OSPAN task tended to be lower than those collected after the first administration of the task meant that the observed impairment effect may have been due to decreases in mood rather than in the amount of available executive resources. To investigate this possibility the variable reflecting the amount of deception impairment was regressed onto a variable representing the difference between the BMIS scores collected after the first administration of the task and those collected after the second administration of the task. The results indicated that the BMIS change scores were not significantly related to the amount of deception impairment, B = 0.01, SE = 0.01, t = 0.87, p = .39, β = .14, 95% CI [-.20, .49]. Importantly, the intercept term in the model was statistically significant, B = 0.07, SE = 0.03, t = 2.13, p = .04, indicating that the accuracy rates for the deceptive messages produced after the administration of the OSPAN task were expected to be an average of 7.03% higher (95% CI [0.34, 13.71]) than those for the deceptive messages produced before the administration of the task when BMIS changes scores were zero (no decrease in mood).

163

Chapter 7: Impairing deception performance by depleting working memory ______Discussion As expected, the OSPAN task was more cognitively demanding then the STM-6 task, which was more cognitively demanding than the STM-2 task. While the accuracy rates for the messages produced after the administration of the STM-2 and STM-6 tasks were approximately the same as those for the messages produced before the administration of the tasks, the accuracy rates for the deceptive messages produced after the administration of the OSPAN task were greater than those for the deceptive messages produced before the administration of the task. When interpreted within a resource depletion framework, this result is consistent with the idea that deception performance depends on the amount of available executive resources. When there are fewer executive resources available the executive demands of deception are processed less efficiently, resulting in more diagnostic behavioural cues, and thus decreased perceptions of credibility. Furthermore, the fact that no differences were observed among the truthful messages provides support for the idea that the executive demands are specific to deceptive messages. Importantly, no effects were observed in either the STM-2 or STM-6 conditions, ruling out the possibility that the effect observed in the OSPAN condition was simply an artefact of having engaged in any cognitively challenging task beforehand. It remains a possibility, however, that the effect observed in the OSPAN condition may have been caused by the demands of the OSPAN task exceeding a particular level of cognitive demand, rather than taxing a particular type of cognitive resource (executive resources). The data from the current study cannot rule out this possibility because the STM-6 task did not only differ in the type of cognitive demand it required, but also in the amount of demand it required, thus the STM-6 task may not have been sufficiently demanding to produce an impairment effect. However, this possibility is unlikely as the STM-6 task was sufficiently demanding to exceed the critical threshold for perfect performance, suggesting that most participant-senders devoted all of their available cognitive resources to the task at hand (task performance was resource-limited). Interestingly, performance on the second administration of the OSPAN task was slightly poorer than performance on the first administration of the task. Furthermore, the second administration of the OSPAN task appeared to cause a decrease in mood. Taken together, these findings suggest that participant-senders may have been in a state of executive depletion prior to the second administration of the OSPAN task. It is possible 164

Chapter 7: Impairing deception performance by depleting working memory ______that this may have been a protracted effect from the first administration of the OSPAN task, or it may have been an effect of the first deceptive message itself. Alternatively, the reduced performance on the second administration of the task may have been caused by reduced mood, with the idea of completing the cognitively challenging task again being an unpleasant thought. While the multiple administrations of the OSPAN task appeared to reduce participant-senders mood, the decrease in deception performance was only slightly moderated by mood changes, with those with a larger decrease in mood having slightly larger decreases in deception performance. This relationship, however, was not statistically significant. Importantly, even after these differences in mood had been accounted for the decrease in deception performance was still statistically significant, although the magnitude of the effect was slightly reduced. Contrary to predictions, the results from the OSPAN condition were inconsistent with the idea that the executive demands of deception are sufficient to influence the diagnosticity of behavioural cues in only those with poor working memory abilities. This is evident by the lack of a significant relationship between the amount of impairment in the OSPAN condition and the working memory measures. This suggests that the deceptive messages produced before the administration of the OSPAN task may have been sufficiently demanding of the working memory system to exceed the majority of the participant-senders available resources, thus engaging in a prior executive task added an equal amount of load to all participant-senders. An alternate interpretation of the null finding is that the current study had insufficient statistical power to detect the underlying effect. There is some evidence to support this assertion as the point estimate for the relationship between the running-span task and the effect of message setting indicated a small effect, with the lower bound of the 95% confidence interval slightly below zero. A post hoc power analysis indicated that the statistical power of the current study for this effect was .15 for a small effect (r = .2), .53 for an intermediate effect (r = .35), and .81 for a large effect (r = .5). Given these findings it is possible that a small/intermediate effect does exist in the population, but the data from the current study is insufficient to assess this hypothesis. The results from the current study also support the idea that the executive demands associated with truth telling are not normally sufficient to cause behavioural cues that are associated with deception. In the studies conducted by Vrij and colleagues 165

Chapter 7: Impairing deception performance by depleting working memory ______(Vrij, et al. 2008; Vrij, et al. 2009; Vrij, et al. 2010; Vrij, et al. 2012), it is possible that their experimental manipulations interacted with veracity such that they only induced higher levels of cognitive load in the deception conditions. While from a practical perspective this may be ideal, it does not afford an assessment of the normal cognitive demands of telling the truth. The experimental manipulations used in the current study, however, would have equally applied to truth-telling and deception conditions. As depleting participant-senders executive resources did not affect perceptions of credibility for truthful messages, this suggests that the executive demands of telling the truth may not normally exceed a person’s executive resources (truth telling is not resource-limited). An important consideration regarding the results from the current study is that the order of the messages was fixed such that the truthful messages always preceded the deceptive messages. This design feature was necessary to maintain the naïve perception that the experiment was investigating the effect of music on opinion formation. If participant-senders were aware of the true aims of the experiment they might have engaged in higher than normal levels of behavioural control. While fixing the order of the message veracity factor may have influenced the discriminability of the messages, any order effects would be held constant across the different experimental conditions. As an effect was observed in the OSPAN condition, but not in the STM-2 and STM-6 conditions, the results cannot be explained by an order effect. Similarly, while the order of the message setting factor was also fixed, this was held constant across conditions such that any order effects would have manifested equally across conditions.

166

Chapter 8: Examining the use of statistics in deception research ______Chapter 8 Study 4 – Examining the use of Statistics in Deception Research This chapter presents a theoretical analysis regarding the different statistical methods commonly used to analyse deception data. This chapter also introduces an alternative statistical procedure (Generalized Linear Mixed Effects models; GLMMs) that overcomes some of the limitations associated with the traditional analysis methods and presents the results from a series of Monte Carlo simulation studies that investigate the efficacy of the different statistical procedures. As these simulations demonstrated the superiority of the alternative statistical procedure, a series of GLMMs were used to re-analyse the data from study three. The results of the re-analysis are presented in this chapter. The theoretical analysis and the Monte Carlo simulation studies have been accepted for publication in the peer-reviewed journal Psychology, Crime and Law.

The use of Statistics in Deception Research A typical experimental paradigm used to investigate the detection of deception through verbal and non-verbal behaviours often involves the recruitment of two independent samples; a sample of senders who are tasked with producing truthful and/or deceptive messages, and a sample of receivers who are tasked with evaluating the veracity of these messages. While on the face of it this methodology seems simplistic, properly understanding the outcomes is not as simple as previous statistical approaches might suggest. In a survey of the literature, Bond and DePaulo (2006) uncovered 206 studies which reported using this type of methodology, signifying its common use. To illustrate the different statistical approaches that have been applied in this area of research, it is helpful to further describe the most basic experimental design used to investigate deception detection. Consider a design where receivers and senders are fully crossed factors, meaning that all receivers evaluate all messages. In this design, veracity is manipulated within senders and receivers, with senders providing one truthful and one deceptive message (for the sake of simplicity), and receivers evaluating a series of truthful and deceptive messages by providing a dichotomous decision as to whether they think each message is truthful or deceptive.

167

Chapter 8: Examining the use of statistics in deception research ______Following Bond and Depaulo (2008), the outcomes from such a design can be organized into a rectangular accuracy matrix (see Table 13). In this matrix each receiver is assigned a unique row while each sender is assigned two columns; one for their truthful message, the other for their deceptive message. A 1 in any particular cell indicates that the respective receiver-sender pairing resulted in a correct decision, whereas a 0 indicates an incorrect decision. To calculate an individual participant’s accuracy, decision-level data is often aggregated, either by- receiver (across ways) or by-sender (down ways). To aggregate decision-level data by- receiver, each row’s marginal mean is calculated (thereby collapsing across senders), resulting in a detection accuracy score for each receiver (the percentage of correct decisions made by each receiver). To aggregate data by-sender, the marginal mean of each sender’s columns is calculated and then averaged (thereby collapsing across receivers), resulting in a detection accuracy score for each sender (the percentage of correct decisions given for each sender). This participant-level data (either receiver or sender) is then often used as the dependant measure in statistical analyses.

Table 15 Example Decision-Level Dataset from a Hypothetical Deception Detection Experiment Senders A B C D Detection Receivers Truth Lie Truth Lie Truth Lie Truth Lie Ability a 1 1 0 1 0 1 1 1 .75 b 1 0 0 1 0 0 0 0 .25 c 1 0 1 0 0 1 0 0 .38 d 1 0 0 1 0 1 1 1 .63 e 1 1 0 0 1 1 1 1 .75 f 1 0 1 1 1 1 1 0 .75 g 0 1 1 1 1 1 0 0 .63 h 0 1 0 0 1 1 0 0 .38 Detectability .63 .51 .65 .44

While the experimental design described above reflects a stereotypical lie detection study, it is important to acknowledge that many lie detection studies use designs which vary in terms of the types of messages they elicit (see Frank & Ekman, 1997; Ganis et al., 2003; Morgan et al., 2009; O’Sullivan, Frank, Hurley, & Tiwana, 2009) , the manner with which senders/messages and/or receivers are allocated to

168

Chapter 8: Examining the use of statistics in deception research ______conditions and the types of veracity judgements being made (see Levine, 2001; Levine, Shaw, & Shulman, 2010). While the precise experimental design used in any particular study informs the specification of the statistical model used to analyse the outcomes, it is common practice to aggregate outcomes either by- receivers or by-sender. Regardless of the design, this participant-level data contains inherent biases that can mislead researchers. The following section illustrates this problem by examining how variation in participant-level data can be influenced by multiple sources.

Limitations of Participant-Level Data One question that deception detection researchers have previously investigated using participant-level data produced from the basic experimental design described above is whether receivers accurately discriminate between truthful and deceptive messages. As the question of interest seemingly pertains to receiver accuracy (e.g. police officers), the decision-level data is typically aggregated by- receiver. That is, each row’s marginal mean is calculated and used as the dependant measure in statistical analyses. Indeed, Bond and DePaulo (2006) reported that percentage correct scores were available for 292 samples of receivers, demonstrating the prominence of data aggregation in this area. To answer the research question certain characteristics of the samples distribution, specifically its mean and variance, are usually used to make inferences about receiver accuracy in the population (via a one-sample t-test). If the sample mean is unlikely to be observed (less than 5% of the time) on a distribution of means influenced only by random sampling error, the researcher would likely conclude that receivers can, on average, discriminate between lies and truths. Indeed, when these types of studies are conducted, receiver accuracy rates are around 54% (Bond and DePaulo, 2006). Importantly, though, researchers are interested in making inferences about the population of receivers rather than the specific sample they have observed. Yet previous research has overlooked some of the limitations imposed by the experimental method when making such inferences. Namely, while a significant result (p < 0.05) does lend a certain degree of confidence about population-level inferences, conclusions are necessarily restricted to the particular sample of senders/messages used in the study. By aggregating the decision-level data by-receiver and subjecting the resulting participant- level data to a one-sample t-test, the researcher has modelled the sample-to-sample 169

Chapter 8: Examining the use of statistics in deception research ______variability in receiver mean scores (the movement in sample means due to random sampling error). This consideration is what allows inferences to be made about the overall population of receivers. What the procedure fails to do, however, is to model the sample-to-sample variability associated with senders/messages. That is, the procedure ignores any potential variation in sender accuracy scores, implicitly treating the senders/messages as a fixed factor. It may be the case that the observed mean receiver accuracy rate was significantly higher than chance because the particular sample of senders/messages used in the study happened to be poor at deceiving (highly detectable), or lower than chance because the senders were good at deceiving (less detectable). Similarly, when the data is aggregated by-sender it is not clear whether the observed samples’ performance reflects the underlying characteristics of the population of senders, or whether it is an artifact of the particular sample of receivers who rated the messages. Simply put, receiver and sender accuracy scores depend on each other, and each sample of participants is subject to sampling error. More appropriate procedures would acknowledge the multiple sources of sampling error and treat both receivers and senders as random factors. While the terms “fixed” and “random factors” are used inconsistently in the literature (Gelman & Hill, 2007), the most common distinction between the two relies on the sampling of factor levels. While a comprehensive review is beyond the scope of this chapter, fixed factors here are those factors whose levels represent all possible levels of that factor (the levels of the factor are exhaustive). For example, an experiment that contains both a treatment and a control group could justifiably model the group effect as a fixed factor because the levels are exhaustive. Participants either receive the treatment or they do not, and there are no other groups that the researcher wishes to generalize the results to. Random factors, on the other hand, are those whose levels represent a sample from a larger population (Green & Tukey, 1960). In deception research the receivers/senders involved in the study do not constitute all the possible receivers/senders worldwide, thus they are more appropriately treated as random factors in analyses. This problem with misidentifying fixed and random factors and its implications for population-level inferences originally came to prominence in the psycholinguistics literature. Clark (1973) argued that the linguistic materials used in psycholinguistic experiments (such as a selection of verbs) are not created by the researcher; rather they 170

Chapter 8: Examining the use of statistics in deception research ______are sampled from a larger population of naturally occurring linguistic materials (all possible verbs) which have idiosyncratic differences between them. Failure to include this item variability became known as the “language as a fixed effect fallacy” (Clark, 1973; Forster & Dickinson, 1976) and we argue that this is the same issue facing deception detection researchers today. Senders/receivers are not created by the researcher; rather they are sampled from a population of senders/receivers who differ from each other in terms of their ability to deceive/detect deception. While a variety of methods to incorporate such variability have been proposed, such as Quasi-F ratios and MinF' statistics (Forster & Dickinson, 1976; Santa, Miller, & Shaw, 1979), in practice these methods are seldom used (Jaeger, 2008; Judd, Westfall, & Kenny, 2012; Raaijmakers, Schrijnemakers, & Gremmen, 1999). Never-the-less, associating random effects with both receivers and senders is important in deception detection research as the outcome of any individual decision is influenced not only by the particular receiver’s characteristics but also by the particular sender’s characteristics. The results from the meta-analysis conducted by Bond and DePaulo (2008) indicated that the measurement corrected standard deviation in accuracy scores was over five times larger when the scores were aggregated by-sender than when they were aggregate by-receiver. This finding implies that if a particular receiver’s observed accuracy is high, it may be because the particular sample of senders they evaluated contained easy to detect senders. That is, regardless of the particular receiver’s long run average accuracy, a highly detectable sender will have an increased probability of being judged correctly. Conversely, if a particular receiver’s accuracy is low, it may be because the particular sample of senders they evaluated contained hard to detect senders. To account for these differences the variation in both sender and receiver scores must be explicitly modelled. In addition to estimating the variation in the accuracy of deception judgements, Bond and DePaulo (2008) also estimated the variation in receiver credulity (the tendency to regard others as truthful) and sender credibility (the tendency to be regarded as truthful), reporting that the measurement corrected standard deviation was more than twice as large when the results were aggregated by-sender than when they were aggregated by-receiver. This is an important consideration because participant bias can inflate the variance in accuracy scores when veracity is manipulated between-subjects. If a highly credible sender is allocated to the truth condition the probability of a correct 171

Chapter 8: Examining the use of statistics in deception research ______decision for each cell in that sender’s respective column is increased, regardless of the sender’s long run average detectability. Given that the variability in sender characteristics is relatively large compared to receiver characteristics, accuracy scores may depend more on which sender gets allocated to which condition. It may be argued that, although any single study may not explicitly model the multiple sources of variability, the fact that the ‘slightly above chance performance’ finding with regard to receiver accuracy has been highly replicated with many different random samples allows the generalization of the finding to extend to both populations. While this argument may be reasonable, aggregating decision-level data either by- receiver or by-sender can produce misleading results, particularly when researchers are concerned with estimating the magnitude of treatment or intervention effects.

Treatment Effects in Traditional Analysis of Variance Procedures While early research into deception detection focused primarily on the ability of receivers to discern lies from truths, researchers have also investigated methods for improving accuracy, including modified interview protocols, receiver training and an array of specialized tools (for a review, see Vrij, 2008). Many of these studies, however, use participant-level data as the dependant measure in analyses. To illustrate how data aggregation has serious consequences when estimating treatment effects in deception detection research, the following section examines the impact of aggregation using a hypothetical study investigating the difference between two interview protocols. When investigating different interview protocols, the basic experimental paradigm is similar to that outlined earlier, with one important difference; there is more than one method for eliciting sender messages. That is, interview protocols differ by condition. Consider an experiment that has two independent factors; interview protocol (new vs. old) and veracity (truth vs. lie). Interview protocol varies between senders and receivers, whereas veracity varies between senders but within receivers. In a typical analysis using by-receiver data, the design would be specified as a 2 x 2 mixed factorial design with receivers evaluating both truthful and deceptive messages (within-subjects

172

Chapter 8: Examining the use of statistics in deception research ______factor), but only one of two possible sets of messages; those elicited by the new protocol or the old protocol (between-subjects factor)23. The question of interest here is whether the new protocol elicits higher mean receiver accuracy rates compared to the old protocol. That is, does the new interview protocol make lies and truths easier to detect? To investigate this question in our hypothetical study, the decision-level data is aggregated by-receiver and then subjected to a two-way mixed Analysis of Variance (ANOVA). The analysis reveals a significant main effect of interview protocol24. The researcher concludes that the two population means likely differ. They further conclude that the observed difference was caused by the different interview protocols. The risk in making this second inference, however, is that the analysis confounds the effect of interview protocol with sender effects. The difference between experimental conditions might be (partly) due to differences between the set of senders used in each condition rather than solely attributable to the interview type. Importantly, by analysing the outcomes using a traditional ANOVA, the researcher can be confident that the two sets of sender messages do produce different mean receiver accuracy rates. What is unclear is whether the different mean receiver accuracy rates were caused by the experimental manipulation or by random sampling error among senders/messages. In the event that a highly detectable sender was allocated to the new protocol condition, the mean receiver accuracy rate for this condition will be higher (everything else being equal) than the mean receiver accuracy rate for the old protocol condition. Failing to account for the potential variation among senders/messages may result in the traditional ANOVA approach having an increased risk of capitalizing on chance (Type 1 error). Previous research has shown that that the failure to account for systematic variation between stimulus items results in inflated Type 1 error rates (Forster & Dickinson, 1976; Rietveld & Van Hout, 2007; Wicken & Keppel, 1983). These studies

23 In an analysis using by-sender data the design would be specified as a 2 x 2 fully between-subjects factorial design where senders either tell the truth or tell a lie (between-subjects factor) while being interviewed with one of the two protocols; either the new protocol or the old protocol (between-subjects factor). 24 Also of interest would be the interaction between interview protocol and veracity, but for the sake of clarity we do not discuss this here. 173

Chapter 8: Examining the use of statistics in deception research ______have shown that, generally speaking, the greater the variation between stimulus items (in this case senders/messages), the greater the inflation of the Type 1 error rate. This is particularly concerning for deception detection researchers as, according to the findings reported in Bond and DePaulo (2008), the variability between senders is relatively large compared to the variability between receivers. One way to minimise the variability in stimulus items across experimental conditions is to use repeated measures designs where possible. In the case of the two different interview protocols discussed earlier, by having each sender complete both interview protocols (or tell lies and truths in other designs), the variability due to individual differences between senders is held constant across conditions. As each sender appears in both sets of messages the researcher can infer that a significant increase or decrease in mean receiver accuracy rates was caused by the experimental manipulation and not random sampling error among senders/messages. It is important to note, however, that while repeated measures designs effectively hold the idiosyncratic differences between senders constant across conditions, thus reducing the probability of Type 1 errors, receiver accuracy rates still depend on the particular sample of senders they are evaluating. This means that if the variability between senders is not incorporated into the statistical model (senders are treated as a fixed factor) the inferences regarding receiver accuracy rates are still restricted to the particular sample of senders used in the study. For studies investigating methods to increase detection accuracy, the primary goal of research should be the isolation of any potential treatment effects and generalization to the broader populations of interest. We are concerned with whether the new interview protocol increases detection accuracy in the population, regardless of the particular sample of receivers and senders. To reliably infer causal treatment effects, statistical techniques must simultaneously account for both the variability between receivers and the variability between senders.

Generalized Linear Mixed Models Applied to Decision-Level Data Since Clark (1973), alternative statistical techniques have been developed which can jointly model the variability between receivers and senders without the methodological restrictions or computational complexities of earlier solutions. The following section introduces generalized linear mixed models (GLMM) as they might 174

Chapter 8: Examining the use of statistics in deception research ______be applied to the study of deception detection. As these models are extensions of classical linear regression models it is useful to begin with a basic regression model fitted to decision-level data. To illustrate the GLMM approach, consider a hypothetical experiment of the kind we have been discussing. The experiment has two factors; condition (varying between receivers and senders) and veracity (varying between senders but within receivers). To avoid the problems caused by data aggregation the decision-level data must be modelled directly. As this data is binary in nature a logistic regression model is the natural starting point. The basic logistic regression model would be specified as25:

Model 1 −1 Pr(푦푖 = 1) = logit (훽0 + 훽1푣1푖 + 훽2푐2푖 + 훽3푣푐3푖)

Here, 푦푖 represents the outcome of decision 푖, 푣 represents the factor for veracity and 푐 represents the factor for condition. This model contains four parameters; one for the intercept (훽0), one for the effect of veracity (훽1), one for the effect of condition (훽2) and another for the interaction between veracity and condition (훽3). While this model allows for a full assessment of the effects of interest, it does not account for the dependencies in the data - the fact that each sender and each receiver are associated with multiple observations - thus it violates a core assumption of the logistic regression model (independence). It may be the case that the outcomes associated with a particular receiver or a particular sender may be more or less likely to be correct than those associated with a different receiver or a different sender, respectively. To account for these dependences the intercept term in the basic logistic regression may be changed such that it is allowed to vary between receivers and between senders. Such a model would be specified as:

Model 2 −1 Pr(푦푖푗푠 = 1) = logit (훽0 + 훽1푣1푖푗푠 + 훽2푐2푖푗푠 + 훽3푣푐3푖푗푠 + 푢0푗 + 푢0푠)

Here, 푦푖푗푠 represents the outcome of decision 푖 made by receiver 푗 when rating sender 푠. These additional subscripts acknowledge that individual decisions are associated with particular receivers and with particular senders. In this model the

25 We note that different authors use different notations. In this chapter we have followed the notation used by Goldstein (2003). 175

Chapter 8: Examining the use of statistics in deception research ______intercept term has two variances associated with it; one estimating the variation in the intercept due to receivers (푢0푗) the other estimating the variation in the intercept due to senders (푢0푠). These additional terms account for the dependencies in the data by adjusting the associated coefficient (훽0) for each level of the grouping factors. Essentially, each receiver and each sender has their own intercept. With these terms included, the fixed effects in the model are estimated at the average level of receiver and sender performance. Although Model 2 is a considerable improvement over Model 1, it implicitly assumes that the effect of veracity remains constant over receivers. However, the results from Bond and DePaulo (2008) suggest that some receivers exhibit a strong tendency to believe messages while others exhibit a strong tendency to disbelieve messages. This variability can also be incorporated into the model by allowing the slope associated with veracity to vary between receivers. This model would be specified as:

Model 3 −1 Pr(푦푖푗푠 = 1) = logit (훽0 + 훽1푣1푖푗푠 + 훽2푐2푖푗푠 + 훽3푣푐3푖푗푠 + 푢0푗 + 푢0푠 + 푢1푗)

In Model 3, the slope associated with veracity has one variance associated with it (푢1푗). This estimates the between receiver variation in the slope for veracity and reflects differences in receiver truth bias. The larger the slope for a particular receiver the more likely that receiver is to correctly identify a typical deceptive message relative to a typical truthful message, regardless of their accuracy. There is no random effect incorporating the variation in the slope due to senders as they are not crossed with veracity in this design.

Empirical Type 1 Error Rates of the Different Statistical Procedures The following section examines the effectiveness of the different statistical procedures that have been discussed when they are applied to data from a hypothetical deception detection experiment using Monte Carlo simulations. Just like the experimental design introduced in the previous section, each simulated experiment had two independent factors; condition (varying between receivers and senders) and veracity (varying between senders but within receivers). The sample sizes were set to 100 receivers and 100 senders with participants being randomly assigned to the levels of the independent factors. Means and standard deviations for the underlying population 176

Chapter 8: Examining the use of statistics in deception research ______parameters were set to the meta-analytic estimates reported in Bond and DePaulo (2008). See Appendix G for more information. In total, the data from 500 simulated experiments were analysed using both traditional ANOVA approaches (by-receiver and by-sender data aggregation) as well as a generalized linear mixed model (Model 3 from the preceding section). The decision criterion was p < .05 for the ANOVAs and t > 2 for the GLMM26. Importantly, the simulations did not include a positive or negative effect of condition, thus when the decision was to reject the null hypothesis it was considered a Type 1 error. The results of the simulations are presented in Figure 16.

1 .0

M a in E ffe c t

e 0 .8

t a

R In te ra c tio n

r

o 0 .6

r

r

E

1

0 .4

e

p y

T 0 .2

0 .0 b y- re c e ive r b y- se nd e r G L M M

F ig u r e 1 6 . E m p iric a l T y p e 1 e rro r ra te s b y s ta tis tic a l m e th o d .

The results of the Monte Carlo simulations show the importance of including sender effects as a potential source of variation when analysing deception detection data. When decision-level data is aggregated by-receiver the empirical Type 1 error rate is over six times larger than the nominal level (5%). This means that if a researcher conducts an experiment with a similar design there is around a 35% chance that they will detect an effect, even though in reality there is none. The inflated error rate was observed for both the main effect of condition as well as for the interaction between

26 Calculating precise p values for fixed effects estimates produced by GLMMs is difficult given the uncertainty associated with identifying the correct degrees of freedom for the t and F distributions that the p values are based on. While there are numerous approximations available, for the purposes of this chapter we can reasonably expect the t-distribution to approximate the normal distribution (the dataset is fairly large and balanced), thus we can assume a coefficient is ‘significant’ if its t-value is greater than 2 (see Baayen, Davidson, & Bates, 2008). 177

Chapter 8: Examining the use of statistics in deception research ______condition and veracity. Of the 500 simulated experiments, 177 (35.4%) falsely discovered a main effect while 158 (31.6%) falsely discovered an interaction. While the by-receiver analyses had a significantly inflated Type 1 error rate, the by-sender analyses had empirical Type 1 error rates (4.8% for the main effect and 5.8% for the interaction) close to the nominal level. This can be explained by the fact that the variation in receiver accuracy scores was very small compared to the variation in sender accuracy scores. Effectively, the variation in receiver accuracy scores was so small that the samples of receivers were somewhat interchangeable. This means that any particular receiver’s accuracy score was almost entirely determined by the sample of sender messages they evaluated. While the interchangeability of receivers has received some support in the literature (Bond and DePaulo, 2008; Levine et al. 2011), the issue remains somewhat controversial (see O'Sullivan, 2008). In reality, if the variance in receiver accuracy scores is greater than that specified in the simulations presented here, the Type 1 error rate for the by-sender analyses would be larger. Another explanation for why the by-receiver and by-sender results differ so widely is the fact that veracity was manipulated between senders but within receivers. As was stated earlier, bias can inflate the variance in accuracy scores when veracity is manipulated between-subjects. The sampling error associated with senders in the by- receiver analyses includes variance attributable to differences in accuracy as well as differences in credibility, while in the by-sender analyses the sampling error includes only the variance attributable to differences in accuracy. Much like the by-sender analyses, the GLMM analyses also had an empirical Type 1 error rate (5.8% for the main effect and 5.2% for the interaction) close to the nominal level. The difference between the two is that the GLMM procedure directly incorporates the variation in receiver accuracy scores while the by-sender ANOVA does not. This means that researchers do not have to invoke controversial arguments regarding the interchangeability of receivers for the results to be accepted. Rather, the procedure appropriately isolates the treatment effects, thus removing the need for additional assumptions.

178

Chapter 8: Examining the use of statistics in deception research ______Statistical Power Considerations In addition to inferential and alpha rate considerations, it is also important to consider the statistical power of the two tests. That is, the by-sender ANOVA and GLMM procedures likely differ with regard to their ability to discover underlying effects. To examine this, a further 500 simulations were conducted where a positive effect was associated with condition. That is, in the following simulations the new interview protocol actually increased the underlying probability of a correct identification compared to the old interview protocol, rather than having no effect like in the earlier simulations. This effect conveyed a 10% advantage to deceptive messages that were elicited by the new interview protocol. This means that we can compare not only the power of each statistical procedure to detect the main effect of condition, but also the interaction between condition and veracity. The results of the simulations are presented in Figure 17.

1 .0

0 .8 M a in e ff e c t e

t In te ra c tio n

a

r

y 0 .6

r

e

v o

c 0 .4

s

i D 0 .2

0 .0 b y- se nd e r G L M M

F ig u r e 1 7 . T ru e d is c o v e ry ra te s b y s ta tis tic a l m e th o d .

The results of the additional Monte Carlo simulations show that the by-sender ANOVA procedure is less powerful than the GLMM procedure. Of the 500 simulated experiments, only 197 (39.4%) of the by-sender analyses discovered the underlying significant main effect of condition compared to 357 (71.4%) of the GLMM analyses. For the interaction between condition and veracity, the GLMM procedure had marginally more power, with 225 (45%) of the analyses discovering the underlying interaction compared to 195 (39%) of the by-sender analyses. Taken together the results from the simulations demonstrate that the GLMM procedure produces fewer errors and affords greater statistical power compared to traditional ANOVA procedures. 179

Chapter 8: Examining the use of statistics in deception research ______These results also suggest that the typical sample sizes found in the deception detection literature may yield insufficient statistical power to discover certain underlying effects. Even with the more powerful GLMM procedure, the probability of discovering the interaction between condition and veracity was considerably less than the 0.8 suggested by Cohen (1992). Several studies in the deception detection literature hypothesize interactions. For example, the cognitive load approach to deception detection attempts to increase the cognitive demand of the interview in an attempt to elicit more diagnostic cues to deception while sparing behaviours associated with truth telling (Vrij et al. 2008; Vrij, Leal, Mann, & Fisher, 2012). While the treatment effect used in the simulations presented here was small (10%), with larger effects being more detectable, the typical sample sizes used in this area are unlikely to yield sufficient statistical power for a robust test of hypotheses. While specific power calculations are beyond the scope of this article and depend on many factors, it is clear that researchers hypothesizing interactions should use sample sizes greater than those used in the simulations presented here.

Re-analysis of Study 3 Data Given the results of the simulation study, it was considered prudent to re-analyse the data from the third study presented in this thesis using a series of GLMMs. The data from the first and second studies presented in this thesis are not suitable for re-analysis as veracity was not balanced across participant-receivers (they were not presented with an equal number of truthful and deceptive messages), thus estimates of by-receiver performance are unable to be calculated. The series of GLMMs that were fit to the data from the third experiment sought to estimate the six simple effects of message setting at each level of message veracity and task type while accounting for the multiple sources of dependences present in the data. All models contained the same 12 parameters in the fixed part of the model: (1) an intercept term, (2) a coefficient for the effect of message setting, (3) a coefficient for the effect of veracity, (4) a coefficient for the effect of the second task type, (5) a coefficient for the effect of the third task type, (6) a coefficient for the two-way interaction between message setting and veracity, (7) a coefficient for the two-way interaction between message setting and the second task type, (8) a coefficient for the two-way interaction between message setting and the third task type, (9) a coefficient for the two-way 180

Chapter 8: Examining the use of statistics in deception research ______interaction between veracity and the second task type, (10) a coefficient for the two-way interaction between veracity and the third task type, (11) a coefficient for the three-way interaction between message setting, veracity and the second task type, and (12) a coefficient for the three-way interaction between message setting, veracity, and the third task type. To obtain estimates for the six simple effects of interest the specified model was fit to the data six times, with each fit containing a different combination of reference levels for task type and veracity. This meant that each of the six coefficients for message setting reflected the simple effects of interest. To test the significance of each simple effect, standard errors were computed via parametric bootstrapping and used to construct 95% confidence intervals for the simple effects. The dependant variable in all models was accuracy, with 0 indicating an incorrect identification and 1 indicating a correct identification. Before fitting the models to the data, a series of preliminary models was used to determine the most appropriate random effects structure. Barr, Levy, Scheepers, and Tily (2013) argued that when assessing interactions in mixed models “one should have by-unit [subject or item] random slopes for any interactions where all factors comprising the interaction are within-unit; if any one factor involved in the interaction is between-unit, then the random slope associated with that interaction cannot be estimated, and is not needed” (p. 275). Following this guideline, the first preliminary model that was fit to the data had a maximal random effects structure and included by- receiver and by-sender intercepts, by-receiver slopes for the effect of veracity, by- sender slopes for the effect of veracity, by-sender slopes for the effect of message setting, and by-sender slopes for the two-way interaction between message setting and veracity. Upon fitting this model, however, several of the random effect components were unable to be estimated and appeared to be redundant parameters. To test whether the inclusion of each term was warranted the random effect terms were sequentially dropped from the model and the parameters re-estimated. Constrained models were then compared to the next highest order model via a series of likelihood ratio tests. This trimming procedure resulted in three random effects being retained in the model; the by- receiver intercepts, the by-sender intercepts, and the by-sender slopes for the effect of veracity. It should be noted that throughout the model trimming procedure the fixed and random estimates did not substantively change. The estimated probability of a correct identification for each cell in the design and the respective 95% confidence intervals are 181

Chapter 8: Examining the use of statistics in deception research ______presented in Figure 1827.

1 .0

B e fo re ta s k a d m in is tra tio n

t c

e 0 .8 A fte r ta s k a d m in is tra tio n

r

r

n

o

o

C

i

t f

a 0 .6

o

c

i

f

y

i

t

t

i

l n

i 0 .4

e

b

d

a

I

b o

r 0 .2

P T ru th fu l M e s s a g e s D e c e p tiv e M e s s a g e s

0 .0 S T M - 2 S T M - 6 O S P A N S T M - 2 S T M - 6 O S P A N

F ig u r e 1 8 . P ro b a b ility o f a c o rre c t id e n tifc a tio n b y ta s k ty p e a n d m e s s a g e s e ttin g . E r ro r b a rs re p re s e n t 9 5 % c o n fid e n c e in te rv a ls .

The results from the analyses were similar to those from the original ANOVA procedure presented in the previous chapter. Indeed, the probability of correctly identifying a deceptive message that was produced after the administration of the OSPAN task was around 9.33% higher (95% CI [4.27, 14.21]) than the probability of correctly identifying a deceptive message that was produced before the administration of the OSPAN task. This estimate is approximately the same as the estimate produced by the ANOVA model (the accuracy rates for the deceptive messages produced after the administration of the OSPAN task were 8.95% higher (95% CI [3.89, 14.00]) than those produced before the administration of the OSPAN task). None of the other planned comparisons were statistically significant (all the confidence intervals included 0). Indeed, the next greatest difference was observed among the deceptive messages in the STM-6 condition, with the probability of correctly identifying a deceptive message that was produced after the administration of the STM-6 task around 4.80% higher (95% CI [-0.28, 9.78]) than the probability of correctly identifying a deceptive message that was produced before the administration of the STM-6 task. While this estimate is slightly larger than the estimate produced from the original ANOVA model (the accuracy rates for the deceptive messages produced after the administration of the STM-6 task were 2.50% higher (95% CI [-2.55, 7.55]) than those produced before the administration of the OSPAN task), neither difference is statistically significant (the confidence interval

27 The original outputs from the series of GLMMs (log odds) were converted to probabilities by taking the exponent of the log odds and then dividing the resulting odds ratio by itself plus one. 182

Chapter 8: Examining the use of statistics in deception research ______includes zero). An inspection of the 95% confidence interval for the estimate produced by the GLMM, however, suggests that the result was more likely to be observed if the alternate hypothesis was true (there were indeed small relationships present in the population) than if the null hypothesis was true. This finding is intriguing as the STM-6 task is not thought to involve any executive control processes; rather it is thought to involve processes related to working memory maintenance. It may be that the task simply depleted attentional resources (common executive control resources) and not working memory updating resources per say. While no depletion effects associated with the STM-6 task were observed in the study conducted Schmeical (2007), it may be the case that the dependant executive measures used in that study were less demanding than the deception task used in the third study presented in this thesis. This hypothesis, however, would need to be empirically assessed before any firm conclusions could be made.

Conclusion While the concept of sampling variation is well understood by most researchers in the field of deception detection, previous studies have failed to account for the multiple sources of sampling variation present in the typical experimental designs used to investigate the area. As our simulations show, this oversight can have serious consequences, with the typical analysis method found in the literature (by-receiver ANOVA) yielding unacceptably high Type 1 error rates. While the fact that ignoring systematic variation between stimulus items results in inflated Type 1 error rates has been known for some time, the recent research suggesting that sender effects are influential sources of variation in veracity judgements brings the problem into sharp relief for deception researchers. Indeed, the finding that the empirical Type 1 error rate may be over six times larger than the nominal level calls into question the reliability of previously reported findings where the data has been analyzed using traditional methods. It is important to point out that the error rates reported in this paper are specific to the experimental design used in the Monte Carlo simulations. Other experimental designs handle variance in different ways which may influence the results. Most notably, repeated measures designs should be used wherever possible as individual differences in the repeated measure are balanced across conditions. This may be 183

Chapter 8: Examining the use of statistics in deception research ______particularly useful when the repeated measure is applied to the factor with the greatest variance (the senders). While the use of repeated measures designs reduces the probability of Type 1 errors, statistical models must still account for both the variability between receivers and the variability between senders if the researcher wishes to make inferences across both populations. Treating a factor as fixed when it is clearly sampled from a larger population misrepresents the true state of the data. Simply stated, where there is sampling, there is sampling error that needs to be accounted for. Another important consideration with regard to sampling errors is that they typically decrease as the number of sampled units increase. In the context of deception research, this means that researchers should acknowledge the multi-dimensional structure of their designs and consider not only the representativeness of their sample of receivers, but also the representativeness of their sample of senders. In other words, the precision of parameter estimates depends on the number of senders, the number of messages per sender, the number of receivers and the number of ratings per receiver. While the precise probability of a Type 1 error depends on the underlying population parameters and how sampled participants are assigned to the levels of the independent variables, data aggregation remains problematic. When it is conducted the researcher runs the risk of capitalizing on chance variation in the data. By avoiding data-aggregation, generalized linear mixed models provide numerous analytical and statistical advantages compared to traditional techniques. Specifically, they are capable of jointly modelling variation between receivers and senders, meaning that a significant result is likely to replicate with new samples of receivers and senders. In other words, they more appropriately model the data structure, thus holding error rates within the prescribed limits and affording greater statistical power and greater generalizations to the populations of interest. This should be one of the primary goals of research in the field of deception detection. The re-analysis of the study 3 data using a series of GLMMs yielded similar results to those obtained from the ANOVA procedure (presented in the previous chapter). The benefit of the re-analysis is that it allows inferences to extend to the population of receivers, as well as the population of senders. That is, the results from the series of GLMMs support the conclusion that the increased detectability in the OSPAN condition was a function of the OSPAN task itself and not a function of the particular sample of participant-receivers who judged the messages for veracity. As the 184

Chapter 8: Examining the use of statistics in deception research ______previous analysis did not incorporate the random sampling error associated with the participant-receivers, the inferences drawn from the results were restricted to the particular sample of participant-receivers used in the study. While the results from the series of GLMMs were consistent with those of the ANOVA procedure, there is less uncertainty regarding the effect size estimates produced by the GLMMs than with the estimates produced by the ANOVA model. This improvement in statistical power revealed an intriguing finding; namely that the effect observed among the deceptive messages in the STM-6 condition was marginally significant (the lower bound of the 95% CI was only slightly below 0). This finding is consistent with the idea that the general demands placed on the executive system during an act of deception are sufficient to influence the diagnosticity of behavioural displays, and thus receiver perceptions of credibility, but that the demands placed on the working memory system are larger and are more causally relevant to the diagnosticity of behavioural displays.

185

Chapter 9: General discussion ______SECTION 4: GENERAL DISCUSSION Chapter 9 Good Liars and Poor Liars This thesis presents a series of empirical investigations into the executive demands of deception. The first two studies used correlation and regression techniques to investigate whether the executive demands of deception are causally relevant to the diagnosticity of behavioural displays, and thus receiver perceptions of credibility. Specifically, the first two studies investigated whether individual differences in executive abilities (working memory updating, inhibitory control, set shifting) predicted individual differences in sender detectability. These studies involved around 363 hours of face-to-face testing with 147 participant-senders (split over two experimental testing sessions) and around 441 hours of online testing with 1,764 participant-receivers. The third study extended the correlational findings of the first two studies by using a resource depletion framework to investigate whether liars with experimentally depleted executive resources (primarily working memory resources) became more detectable (worse liars). This study involved around 285 hours of face-to-face testing with 114 participant-senders and around 599 hours of online testing with 798 participant-receivers. Across the three empirical studies reported in this thesis, more than 21,000 veracity judgements were analysed. This thesis also presents a theoretical analysis of the different statistical methods commonly used to analyse deception data. This analysis brings to light several limitations associated with traditional approaches and provides an introduction to a more appropriate analytical procedure that overcomes these limitations. One thousand Monte Carlo simulation studies were conducted to assess the efficacy of the different approaches, with each set of outcomes analysed via three different statistical procedures (3000 analyses total). This chapter summarises the methods and results of the research presented in this thesis and discusses the implications of the findings for deception theory and research. This chapter also discusses the practical applications of the work, its significance and innovation, some of the associated limitations, and recommendations regarding future research in the area. Finally, several conclusions regarding the thesis aims and research questions are offered. 186

Chapter 9: General discussion ______

Summary of Methods and Results In the first two studies reported in this thesis, participant-senders were video and audio recorded while producing messages regarding controversial socio/political issues. Of these messages, some were produced under conditions where participant-senders were unaware that their messages were going to be evaluated for veracity and where they were given no instructions on how they should behave during the interviews (naïve truth condition), some were produced under conditions where participant-senders were informed that their messages were going to be evaluated for veracity and where they were instructed to try to appear as credible as possible (informed truth condition), and some were produced under conditions where participant-senders were instructed to provide false opinions while trying to appear as credible as possible (deceptive condition). After each message, participant-senders completed three self-report measures. These measures were designed to assess how much emotional activation, cognitive load, and behavioural control participant-senders experienced during the different conditions. Finally, to operationalize participant-sender performance in the false opinion task, samples of participant-receivers evaluated the participant-sender messages for veracity (truth vs. lie). While the experimental methods of the first and second studies were similar, the second study contained several modifications that were designed to remedy/minimize some of the limitations of the first study. Specifically, in addition to the original self- report measures, participant-senders in the second study also completed three supplementary self-report measures at the end of the false opinion task (once they had completed the naïve truth, informed truth, and deceptive conditions). These supplementary self-report measures were designed to further assess how levels of emotional activation, cognitive load, and behavioural control varied across the three experimental conditions and whether some of the findings observed in the first study reflected a ceiling effect in the original self-report measures. The interview protocols of the second study were also different to those of the first study. In the second study, each participant-sender’s initial response was followed-up by several probe questions. These probe questions were designed to increase the cognitive complexity of the false opinion task so as to provide participant-receivers with more information on which to base their veracity judgements. Furthermore, the opinion survey used in the second study only included opinion items where receivers perceived a sender response in agreement to be 187

Chapter 9: General discussion ______approximately as likely as a sender response in disagreement. These modifications served to decrease the amount of measurement error associated with the sender performance measures. The results from the self-report measures were consistent with the idea that high levels of impression management are not an exclusive feature of deceptive communication. Specfically, the average self-report ratings of behavioural control for the informed truth conditions (study 1 M = 6.62; study 2 M = 6.23) were significantly higher than the average self-report ratings for the naïve truth conditions (study 1 M = 4.73; study 2 M = 5.15). This finding was also observed on the supplementary self- report measure used in the second study (informed truth M = 84.73; naïve truth M = 55.04). Importantly, for both the original and supplementary self-report measures of behavioural control, the average ratings for the informed truth conditions were near the top anchors of both scales, as were the average self-report ratings for the deceptive conditions (study 1 M = 6.56; study 2 M = 6.33; supplementray measure M = 88.67). These results support the idea that people who are not taking their credibility for granted and who are trying to foster a credible impression tend to engage in near maximum levels of behavioural control, regardless of whether they are telling the truth or lying. The results from the self-report measures were also consistent with the idea that engaging in high levels of impression management tends to increase the amount of cognitive load senders experience. Specifically, the average self-report ratings of cognitive load for the informed truth conditions (study 1 M = 4.58; study 2 M = 5.77) were significantly higher than the average self-report ratings for the naïve truth conditions (study 1 M = 3.90; study 2 M = 4.90). Like the self-report measure of behavioural control, this finding was also observed on the supplementary self-report measure (informed truth M = 50.92; naïve truth M = 37.11). Moreover, producing a false opinion appeared to further increase perceptions of cognitive load. In the first study, the average self-report rating of cognitive load for the deceptive condition (M = 5.54) was significantly higher than the average self-report rating for the informed truth 2 condition, with this difference (η p = .16) slightly greater than the difference between 2 the naïve truth and informed truth conditions (η p = .11). In the second study, however, the opposite pattern was observed, with the difference between the deceptive (M = 6.07) 2 and informed truth conditions (η p = .05) smaller than the difference between the naive 2 truth and informed truth conditions (η p = .16). Importantly, the average ratings for the 188

Chapter 9: General discussion ______informed truth and deceptive conditions of the second study were both near the top anchor of the self-report scale, hence it was thought that they could have been truncated by a ceiling effect in the original self-report measure. Indeed, the addition of the probe questions appeared to increase the overall amount of cognitive load perceived by senders such that the original self-report scale was no longer sensitive to differences across the informed truth and deceptive conditions. This interpretation was supported by the results from the supplementary self-report measure of cognitive load. These results mirrored the findings of the first study; namely that the difference between the average self-report ratings of cognitive load for the deceptive (M = 73.08) and informed truth 2 conditions (M = 50.92; η p = .23) was greater than the difference between the self-report 2 ratings for the naïve truth (M = 37.11) and informed truth conditions (η p = .17). When these results are taken together, they support the idea that the cognitive costs associated with deceptive message production are greater than those associated with impression management. Finally, the results from the self-report measures were also consistent with the idea that the addition of probe questions increases the amount of emotional activation senders experience. In the first study, the deceptive condition (M = 5.62) induced significantly higher average levels of self-reported emotional activation than the naïve truth (M = 4.94) and informed truth (M = 4.73) conditions. In the second study, however, the deceptive condition (M = 5.98) only induced slightly higher average levels of self-reported emotional activation than the naïve truth (M = 5.59) and informed truth (M = 5.85) conditions. This finding was consistent with the results of the supplementary measure (naïve truth M = 45.26; informed truth M = 49.98; deceptive M = 56.38). While the average self-report ratings of emotional activation for the deceptive condition were approximately the same across the two studies, the average self-report ratings for the naïve truth and informed truth conditions were significantly higher in the second study than in the first. This increase in emotional activation during the naïve truth and informed truth conditions is thought to explain the smaller effect sizes observed in the second study. To assess whether the executive demands of deception are sufficient to influence the diagnosticity of behavioural displays, participant-senders in both the first and second studies also completed several executive control tasks. These tasks were designed to measure aspects of their executive abilities (working memory updating, 189

Chapter 9: General discussion ______inhibitory control, and set shifting). While participant-senders in the first study completed three executive tasks, participant-senders in the second study completed three sets of executive tasks (nine tasks total). By having participant-senders complete multiple tasks that were designed to tap the same underlying executive ability, the covariances within each set of tasks could be used to more accurately estimate each participant-sender’s true executive abilities (through latent variable analysis). This approach proved fruitful and yielded improved estimates of each participant-sender’s true executive abilities. The research question pertained to whether the outcomes from the executive tasks were related to sender detectability scores and whether the magnitude of the relationship depended on which truthful condition (naïve truth vs. informed truth) was used to operationalize sender performance. While the results from the first study were encouraging, with several of the predicted relationships observed (poor working memory updating was associated with slightly higher naïve (r = .20) and informed (r = .24) detectability scores while poor set shifting was associated with slightly higher naïve (r = .23) detectability scores), the results from the second study provided stronger evidence. Specifically, a series of structural equation models revealed that once individual differences in working memory updating had been taken into account, inhibitory control and set shifting skills were unrelated to sender detectability scores. Working memory updating, on the other hand, consistently accounted for a significant amount of the variability in sender detectability scores (around 18% of the variability in naïve detectability scores and around 16% of the variability in informed detectability scores). This finding was observed irrespective of whether working memory updating was modelled in isolation or with inhibitory control and set shifting. These results support the idea that the demands deception places on the working memory system are sufficient to exceed most people’s ability to process the demands without observable signs of cognitive strain. Furthermore, the results support the idea that receivers are sensitive to signs of cognitive strain and tend to correctly associate them with deception. In other words, people with poor working memory updating skills tend to be bad liars. Another important finding regarding the relationship between working memory updating skills and sender detectability scores is that the magnitude of the relationship did not appear to depend on which type of truthful message was used to operationalize sender performance (naïve truth vs. informed truth). This supports the idea that while 190

Chapter 9: General discussion ______people tend to find controlling their expressive behaviours as cognitively demanding, the demands are either not related to the executive system or are not sufficiently taxing of the executive system to influence the diagnosticity of behavioural displays. Converesly, the demands associated with deceptive message production do appear to be sufficiently taxing of the working memory system to cause people to display signs of cognitive strain during deception. The third study reported in this thesis sought to extend the findings of the first two studies by using a resource depletion framework to assess whether liars with experimentally depleted executive resources (primarily working memory resources) become more detectable (worse liars). Like the first two studies, participant-senders were video and audio recorded while they produced messages regarding controversial socio/political issues. Once participant-senders had recorded a truthful message, they completed a short buffer task (listening to music for 10 minutes), after which they completed one of three cognitive tasks (STM-2, STM-6, or OSPAN) that had been previously shown to differentially manipulate the availability of executive resources. Participant-senders then provided another truthful message immediately after they had completed the cognitive task. This procedure was then repeated by the participant- senders, except that the messages the second time around were deceptive rather than truthful. Like the first two studies, participant-sender performance was operationalized by having a sample of participant-receivers evaluate the participant-sender messages for veracity (truth vs. lie). The results of the third study indicated that while the average accuracy rates for the truthful and deceptive messages produced after the administration of the STM-2 (truthful M = 57.2%; deceptive M = 52.9%) and STM-6 tasks (truthful M = 56.4%; deceptive M = 53.6%) were approximately the same as those for the truthful and deceptive messages produced before the administration of the tasks (truthful STM-2 M = 58.6%; deceptive STM-2 M = 52.4%; truthful STM-6 M = 58.2%; deceptive STM-6 M = 51.1%), the average accuracy rate for the deceptive messages produced after the administration of the OSPAN task (M = 59.6%) was significantly higher than the average accuracy rate for the deceptive messages produced before the administration of the OSPAN task (M = 50.7%). That is, deceptive messages produced by senders with depleted executive resources were around 8.95% more detectable than those produced under control conditions. No differences were observed across the truthful messages in 191

Chapter 9: General discussion ______the OSPAN condition (before M = 58.6%; after M = 58.0%). When these results are interpreted within a resource depletion framework, they support the idea that deception performance depends on the amount of available executive resources. That is, when there are fewer executive resources available, the executive demands of deception are processed less efficiently, resulting in more diagnostic behavioural displays, and thus decreased perceptions of credibility. Furthermore, the fact that no differences were observed among the truthful messages provides support for the idea that the executive demands associated with deceptive message production are higher than those associated with truthful message production. The fourth study presented in this thesis focused on a theoretical analysis of the different statistical procedures commonly used to analyse deception data and used Monte Carlo simulations to demonstrate the inherent biases that exist in participant- level data. These simulations demonstrated that under certain conditions, empirical Type 1 and Type 2 error rates associated with main effects testing may be as high as 35% when data is aggregated by-receiver and as high as 60% when data is aggregated by-sender, respectively. When decision-level data is modelled directly, however, error rates may be close to nominal levels (6% and 28%, respectively). In light of these findings, it was considered prudent to re-analyse the results of the third study using the more appropriate statistical technique. The results of the re-analysis were almost entirely consistent with the earlier analysis, reaffirming the importance of working memory skills in deception. Interestingly, the effect in the STM-6 condition was larger in the re-analysis, with the probability of correctly identifying a deceptive message produced after the administration of the STM-6 task 4.80% higher than the probability of correctly identifying a deceptive message produced before the administration of the STM-6 task. While the level of the evidence was insufficient to make any firm conclusions, this finding suggests that attentional resources (common executive control resources) may also be important to the ability to deceive.

192

Chapter 9: General discussion ______

Implications for Deception Theory and Research The work presented in this thesis has several implications for deception theory and previous research. Specifically, the finding that informed detectability scores tend to be smaller than naïve detectability scores implies that the messages produced in the informed truth conditions tended to be perceived as less credible than the messages produced in the naïve truth conditions (both the naive and informed detectability scores used the same deceptive conditions in calculations). This supports the idea that when truth-tellers are motivated to be believed they tend to be perceived as less credible than when they are not particularly motivated to be believed. Such a finding is consistent with DePaulo’s (1992) Self-Presentational Perspective, which predicts that as levels of behavioural control increase, perceptions of credibility decrease. Moreover, the finding is inconsistent with Buller and Burgoon’s (1996) Interpersonal Deception Theory, which predicts the opposite pattern of results (as levels of behavioural control increase, perceptions of credibility increase). The above finding also has implications for the Motivational Impairment Effect (MIE) literature. While previous research conducted by DePaulo and colleagues demonstrated that motivational manipulations tend to cause an increase in the discriminability of truthful and deceptive messages (see DePaulo et al., 1983; DePaulo et al., 1985; DePaulo et al., 1988; DePaulo et al., 1991), the mechanism behind this increase in discriminability was unclear (whether the observed increases were driven by changes in truthful or deceptive performances). The results from the first two studies reported in this thesis support the idea that the previously observed increases in discriminability were likely driven by a change in both truthful and deceptive performances. Specifically, if high levels of sender motivation cause truthful performances to decrease (as observed), an increase in discriminability can only be observed when high levels of sender motivation cause deceptive performances to decrease more than truthful performances. That is, high levels of sender motivation must impair deceptive performances more than they impair truthful performances. Importantly, this interpretation supports the original conclusions put forward by DePaulo and colleagues; namely that a high level of sender motivation tends to impair deception performance. With regard to the cognition of deception, several theories include mechanisms whereby lying is thought to induce more cognitive load than truth-telling. Zuckerman et 193

Chapter 9: General discussion ______al.’s (1981) multi-factor theory of deception, Buller and Burgoon’s (1996) Interpersonal Deception Theory (IDT), and DePaulo’s (1992) Self-Presentational Perspective (SPP) all contend that the increased cognitive demands of deception stem from the additional tasks liars must engage in to conceal their deception. These additional tasks consist of those associated with impression management and those associated with deceptive message production. While the results reported in this thesis are consistent with the idea that impression management tasks contribute to the higher levels of cognitive load that are generally reported by liars (see Caso et al., 2005; Gozna et al., 2001; Vrij et al., 2001; Vrij & Mann, 2006; Vrij et al., 2006; Vrij et al., 1996; Wright et al., 2012), the tasks associated with impression management do not appear to be as cognitively demanding as those associated with deceptive message production. Moreover, the results of this thesis support the idea that while the demands associated with impression management are either not related to the executive system or are insufficient to influence the diagnosticity of behavioural displays, the demands associated with deceptive message production are related to the executive system (particularly the working memory system) and are sufficient to exceed most people’s ability to process the demands without observable signs of cognitive strain. As such, the emphasis in the above theories of deception should be shifted towards the cognitively demanding nature of deceptive message production and away from the demands associated with impression management. While the multi-factor model of deception, IDT, and the SPP tend to emphaize the demands assocaited with impression management, more contemporary cognitive models of deception place a greater emphasis on the demands assocaited with deceptive message production, such as Sporer and Schwandt’s (2006; 2007) working memory model of deception and Walczyk et al.’s (2014) recently proposed Activation-Decision- Construction-Action Theory (ADCAT) of deception. In ADCAT, truth-telling and deception are both thought to involve the same four basic underlying cognitive processes (activation, decision, construction, and action). The theory predicts that whichever message (truthful vs. deceptive) requires a more complex response (greater construction) and/or more explicit memory searching (greater activation) will induce higher levels of cognitive load. While the false opinion task used in this thesis was not designed to tease apart these processes, participant-senders did not have to make any decisions regarding the level of honesty/dishonesty to be conveyed during the 194

Chapter 9: General discussion ______interviews (this was determined by the cue cards they were given) and the way the messages were delivered (the action component) was held relatively stable across the different conditions (the interview protocol was standardized). As such, the main difference between the truthful and deceptive messages used in the false opinion task pertains to the activation and construction components of the model. Importantly, the inclusion of the devil’s advocate questions meant that for both truthful and deceptive messages, senders had to activate opinion-consistent and opinion- inconsistent information at different times during the interviews. The results from study two suggest that when the message was deceptive, participant-senders may have put more effort into activating opinion-inconsistent information and/or less effort into activating opinion-consistent information than when the message was truthful (perceptions of cognitive load were higher in the deceptive condition than in both of the truthful conditions). This difference in activation (explicit memory searching) may have been caused by liars activating both opinion-inconsistent and opinion-consistent information during the opinion-consistent part of the interview (the first few questions). That is, when asked to justify their opinion, opinion-consistent information may automatically be activated in long-term memory and transferred to working memory. When the message is truthful, the opinion-consistent information simply needs to be organized into a coherent narrative. When the message is deceptive, however, opinion- inconsistent information must also be activated and transferred to working memory during this stage of the interview. Liars must then hold and suppress the already activated opinion-consistent information in working memory while organizing the opinion-inconsistent information into a coherent narrative. The results from the receiver veracity judgements support the idea that this process causes liars to display more observable signs of cognitive strain at inappropriate times. Specifically, the results suggest that liars may have shown more signs of cognitive strain during the purported opinion-consistent part of the message (when receivers might expect fewer signs of cognitive strain) than during the purported opinion-inconsistent part of the message (when receivers might expect more signs of cognitive strain). Truth-tellers, on the other hand, may have shown the opposite pattern. This interpretation is consistent with the findings reported by Leal, Vrij, Mann, and Fisher (2010) and supports Walczyk et al.’s (2014) assertion that messages which involve greater activation (explicit memory searching) and/or greater construction generally induce higher levels of cognitive load. 195

Chapter 9: General discussion ______

Importantly, while the results of this thesis also support Walczyk et al.’s (2014) assertion that the cognitive demands of deception are related to the working memory system, they also extend the theory by demonstrating that the above process is not innocuous and that differences in activation and/or construction tend to manifest at the behavioural level. This thesis also provides insight into the theoretical mechanism of action behind recent cognitive approaches to deception detection. Previous research conducted by Vrij and colleagues has demonstrated that increasing the cognitive demands of an interview causes deception performance to decrease while truthful performance remains relatively stable (see Vrij, et al. 2008; Vrij, et al. 2009; Vrij, et al. 2010; Vrij, et al. 2012). While the results of the third study reported in this thesis confirm this finding, the difference between the effects in the STM-6 and OSPAN conditions support the idea that both general executive resources and specific working memory resources are required to successfully deceive. That is, while common attentional resources may contribute to deception performance (the effect was marginal), working memory resources contribute to deception performance over and above any effects associated with common executive resources. This is an important finding as it suggests that cognitive approaches to deception detection may be more effective if they specifically target working memory resources rather than general attentional resources. The results reported in this thesis also have important implications for the neuroimaging of deception. While much of this research points to the executive control system as important to deception (see Abe, 2009; Abe, 2011; Gombo, 2006; Johnson et al., 2004; Langleben et al., 2005; Spence et al., 2001; Spence et al., 2004), neuroimaging studies by themselves do not provide evidence regarding the causal relevance or functional contribution of this system to the behavioural correlates of deception. Although future research in this area is needed to identify the boundary conditions under which the cognitive demands of different types of deception become causally relevant to the diagnosticity of behavioural displays, the results reported in this thesis demonstrate that producing false opinions not only requires executive resources (particularly working memory updating resources, as implicated by neuroimaging research), but that the amount of resources that are required often exceeds the amount that are available (the false opinion task is resource-limited when outcomes are measured at the level of receiver perceptions of credibility). 196

Chapter 9: General discussion ______

Practical Implications The results reported in this thesis have several practical implications. Specifically, the finding that people with poor working memory skills tend to be worse liars than those with good working memory skills suggests that the cognitive approach to deception detection may be more effective when it is applied to those with poor working memory skills. These people already appear to have difficulty processing the cognitive demands of deception in an innocuous manner, thus an increase in task demands would further debilitate deception performance. People with good working memory skills, on the other hand, appear to process the cognitive demands associated with deception more effectively, thus an increase in task demands may only slightly debilitate deception performance (or not at all). When this is the case, it may be more beneficial to employ other deception detection methods. While the current thesis does not attempt to map the cognitive demands associated with different interview protocols or different types of lies (only false opinions were investigated), it does demonstrate that an individual differences approach is capable of determining whether certain cognitive abilities are more or less taxed under different conditions. If this approach was applied to a wider range of interview protocols and different types of lies, it may be possible to match interview protocols to the specific cognitive strengths and weaknesses of the interviewees to maximize detection accuracy. Such a process may involve conducting brief psychometric testing with an interviewee and using the results to determine which interview protocol is more likely to be effective. Having interviewees complete a battery of cognitive tasks before an interview may not only provide insights into which interviewing strategy may be of most benefit, but the administration of the tasks themselves may be of benefit. The third study reported in this thesis demonstrated that people who completed an executive task before they produced a false opinion had impaired deception performances. When this approach is applied in conjunction with certain cognitively demanding interview protocols, the effects may interact with each other, resulting in a further decrease in deception performance.

197

Chapter 9: General discussion ______

Significance and Innovation This thesis extends the emerging literature on the role of executive control processes in deception and provides a valuable contribution to the study of deception. It provides the first empirical tests assessing whether the executive demands of deception are causally relevant to the diagnosticity of behavioural displays. In this respect, the current thesis bridges the gap between the research on the cognition of deception (the neuroimaging research and the research which uses specialized tools with reaction time based measures of task performance) and the research investigating individual difference factors (the research regarding the behavioural correlates of deception and the research regarding perceptions of credibility). This thesis also discriminates between the cognitive costs associated with impression management and the costs associated with deceptive message production. This is an important consideration as impression management is not an exclusive feature of deceptive communication. This thesis is also the first to apply a resource depletion framework to investigate the role of executive control processes in deception. This is an important contribution to the field as the framework can be used in future research to establish the causal relevance of the different aspects of executive control to different types of deception. By establishing which executive resources are consumed during certain acts of deception, targeted interventions can be developed which seek to deplete specific executive resources in an effort to disrupt people’s ability to deceive. Finally, the theoretical analysis of the different statistical procedures commonly used to analyse deception data is another important contribution of this thesis. The analysis is the first to highlight the problems associated with data aggregation in the analysis of deception data. It also provides a solution to the problems caused by data aggregation, thus providing deception researchers with more powerful and appropriate procedures for use in future research.

198

Chapter 9: General discussion ______

Limitations and Future Directions An important consideration regarding the generalizability of most deception research is whether laboratory based paradigms illicit the same underlying psychological and emotional processes as those involved in real world instances of deception. In particular, the lack of meaningful consequences for successful and/or unsuccessful deception in laboratory based research has called into question participants’ motivation to succeed, with participants presumably less motivated to succeed in laboratory based paradigms than in more serious real world instances of deception (Frank & Feeley, 2003; Porter & ten Brinke, 2010; Wright Whelan, Wagstaff, & Wheatcroft, 2014; Wright et al., 2013). This difference in motivation may serve to decrease the amount by which deceptive statements can be differentiated from truthful statements as participants in laboratory studies may not experience much emotional activation and/or cognitive load during deception, and thus they may not display the behavioural cues associated with such activation. Alternatively, this difference in motivation may serve to increase the amount by which deceptive statements can be differentiated from truthful statements as participants in laboratory studies may not try very hard to control their expressive behaviours during deception. The results from the self-report measures, however, seem to rule out the latter hypothesis. Specifically, the results from the self-report measures indicated that while participant-senders did not experience much variation in emotional activation across the naïve truth, informed truth, and deceptive conditions, they did experience a fair amount of variation in cognitive load across the conditions. This suggests that while the false opinion task used in the current experiments likely produced low-stakes lies, arguably diminishing emotional deception cues, the cognitive elements associated with deception were still sufficient to evoke cognitive cues. It may be the case that managing higher levels of emotional activation increases the demands associated with impression management, which may increase the magnitude of the effects reported in this thesis. Future research could test this hypothesis by increasing the motivational incentives for the informed truth and deceptive conditions. Another limitation of the current research is that only one type of lie was assessed across the three laboratory based investigations. As previous research has demonstrated that different types of lies tend to engage distinct cognitive processes (Ganis, et al., 2003; Morgan, et al., 2009), the results reported in this thesis may not 199

Chapter 9: General discussion ______generalize to other types of deception. For instance, the participant-senders in the studies reporting in this thesis were not given any time to prepare their messages beforehand. This design feature was included so that the construction of the messages would contribute to the cognitive demands of the false opinion task. If the participant- senders had been given a forewarning about which opinion item would be the focus of the interview and given time to prepare their answer, it is foreseeable that the demands placed on the working memory system would have been lower. In this instance, different executive control processes, such as inhibitory control, may have been more strongly related to sender performance. Likewise, when people tell lies about factual events, rather than about opinions/beliefs, different cognitive processes may be more important to the success of a deceptive message. To investigate this possibility, future research should use other lie paradigms to assess whether the relationships observed in this thesis depend on the type of interview protocol used and whether they depend on the type of deception. A particular question that remains unanswered in this thesis pertains to the behavioural displays of the senders. While the finding that senders with poor working memory abilities are more detectable than senders with good working memory abilities speaks to the occurrence of behavioural differences, the precise nature of the behavioural differences remains speculative. That is, while there is strong evidence for behavioural differences, this thesis does not identify where these differences occur (body cues, facial cues, speech content etc.). Future research can address this limitation by having trained raters code the occurrence of certain behavioural displays in each sender’s messages. The coded behavioural cue scores could then be used as indicators in a mediational analysis to determine the relationship between executive abilities, behavioural displays, and perceptions of credibility. Future research would also benefit from using a community sample of participant-senders, rather than undergraduate students. As undergraduate students presumably have a restricted range of executive abilities, with relatively few students having very poor working memory abilities, the estimates reported in this thesis may actually underestimate the effects in the general population. That is, people with very poor working memory abilities may be more impaired by the cognitive demands of deception than some of the worst liars sampled in this thesis.

200

Chapter 9: General discussion ______

Another area for future research to explore would be the effects that different executive tasks have on subsequent deception performance. To further investigate whether decreased perceptions of credibility are caused by poor working memory abilities, future research could have participant-senders complete either an inhibitory control or set shifting task before a deceptive act. While the different executive tasks may all deplete a common pool of executive resources, they may differentially deplete the specific executive resources associated with the three somewhat distinct executive processes. As such, a comparison of the different effect sizes would afford an assessment of the degree to which each distinct process was involved in deception. The logic of this procedure also works the other way around, with deception tasks preceding executive control tasks. If deception differentially engages and consumes specific executive resources, then performance on subsequent executive control tasks that also require that specific resource would be expected to lower than a baseline measurement. Provided this effect is not observed with truthful messages, a comparison of baseline and post interview executive performance may provide useful diagnostic information. As this procedure has the potential to be more sensitive to resource depletion effects, it could not only be used to further investigate the cognitive demands of deception, but may also be useful in determining whether a person was telling the truth or lying.

201

Chapter 9: General discussion ______

Conclusions Overall, the results from the three empirical studies investigating the executive demands of deception and the results from the theoretical analysis/Monte Carlo simulations investigating the different statistical methods commonly used to analyse deception data support four key conclusions:

1. The executive demands of deception are casually relevant to the diagnosticity of behavioural displays, and thus receiver perceptions of credibility. Specifically, people with poor working memory abilities are less able to process the cognitive demands of deception than people with good working memory abilities. This causes people with poor working memory abilities to display more observable signs of cognitive strain than people with good working memory abilities. Importantly, receivers are sensitive to these differences in behavioural displays and tend to correctly associate signs of cognitive strain with deception. Ultimately, people with poor working memory abilities are bad liars.

2. The cognitive demands associated with impression management are either unrelated to the executive system or are insufficient to influence the diagnosticity of behavioural displays, and thus recevier perceptions of crediblity. The cognitive demands associated with deceptive message production, on the other hand, are related to the executive system (particularly the working memory system) and are sufficient to exceed most people’s ability to process the demands without observable signs of cognitive strain.

3. A resource depletion framework is a useful approach to investigate the cognitive demands associated with different interview protocols and different types of deception. Moreover, depleting executive resources (particular working memory resources) before an act of deception impairs deception performance.

4. Research that uses multiple samples of participants (participant-senders and participant-receivers) must account for the various dependences present in outcome data. Failure to do so increases the risk of Type 1 errors. Generalized Linear Mixed Models are more suitable to the analysis of deception data. 202

References ______

REFERENCES Abe, N. (2009). The neurobiology of deception: Evidence from neuroimaging and loss- of-function studies. Current Opinion in Neurology, 22(6), 594 - 600. doi: 10.1097/WCO.0b013e328332c3cf Abe, N. (2011). How the Brain shapes deception: An integrated review of the literature. The Neuroscientist, 17(5), 560 - 574. doi: 10.1177/1073858710393359 Abuthnott, K., & Frank, J. (2000). Trail making test, part B as a measure of executive control: Validation using a set-switching paradigm. Journal of Clinical and Experimental Neuropsychology, 22(4), 518 - 528. doi: 10.1076/1380- 3395(200008)22:4;1-0;FT518 Agresti, A., & Finlay, B. (1997). Statistical methods for the social sciences (3rd ed.). New Jersey, NJ; Prentice Hall. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modelling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390 - 412. doi: 10.1016/j.jml.2007.12.005 Baddeley, A. D. (1992). Working memory. Science, 255(5044), 556 - 559. doi: 10.1126/science.1736359 Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417 - 423. doi: 10.1016/S1364- 6613(00)0138-2 Bagley, J., & Manelis, L. (1979). Effect of awareness on an indicator of cognitive load. Perceptual and Motor Skills, 49(2), 591 - 594. doi: 10.2466/pms.1979.49.2.591 Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255 - 278. doi: 10.1016/j.jml.2012.11.001 Bauer, L. O., Strock, B. D., Goldstein, R., Stern, J. A., & Walrath, J. C. (1985). Auditory discrimination and the eye blink. Psychophysiology, 22(6), 629 - 635. doi: 10.1111/j.1469-8986.1985.tb01660.x Baumeister, R. F., Vohs, K. D., & Tice, D. M. (2007). The strength model of self- control. Current Directions in Psychological Science, 16(6), 351 - 355. doi: 10.1111/j.1467-8721.2007.00534.x

203

References ______

Beilock, S. L., & Carr, T. H. (2005). When high-powered people fail: Working memory and ‘choking under pressure’ in math. Psychological Science, 16(2), 101 - 105. doi: 10.1111/j.0956-7976.2005.00789.x Berg, E. A. (1948). A simple objective technique for measuring flexibility in thinking. Journal of General Psychology, 39, 15 - 22. doi: 10.1080/00221309.1948.9918159 Bertrand, P. V., & Holder, R. L. (1988). A quirk in multiple regression: The whole regression can be greater than the sum of its parts. The Statistician, 37(4/5), 371 - 374. Bond, C. F., & DePaulo, B. M. (2006). Accuracy of deception judgments. Personality and Social Psychology Review, 10(3), 214 - 234. doi: 10.1207/s15327957pspr1003_2 Bond, C. F., & DePaulo, B. M. (2008). Individual differences in judging deception: Accuracy and bias. Psychological Bulletin, 134(4), 477 - 492. doi: 10.1037/0033-2909.134.4.477 Bond, G. G., Malloy, D. M., Thompson, L. A., Arias, E. A., & Nunn, S. N. (2004). Post-probe decision making in a prison context. Communication Monographs, 71(3), 269 - 285. doi: 10.1080/0363452042000288328 Broadway, J. M., & Engle, R. W. (2010). Validating running memory span: Measurement of working memory capacity and links with fluid intelligence. Behavior Research Methods, 42(2), 563 - 570. doi: 10.3758/BRM.42.2.563 Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3 - 5. doi:10.1177/1745691610393980 Buller, D. B., & Burgoon, J. K. (1996). Interpersonal deception theory. Communication Theory, 6(3), 203 - 242. doi: 10.1111/j.1468-2885.1996.tb00127.x Buller, D. B., Strzyzewski, K. D., & Gomstock, J. (1991). Interpersonal deception: I. Deceiver's reactions to receivers' suspicions and probing. Communication Monographs, 58(1), 1 - 24. doi: 10.1080/03637759109376211 Bunting, M., Cowan, N., & Saults, J. S. (2006). How does running memory span work? Quarterly Journal of Experimental Psychology, 59(10), 1691 - 1700. doi: 10.1080/17470210600848402

204

References ______

Burgoon, J. K., & Floyd, K. (2000). Testing for the motivation impairment effect during deceptive and truthful interaction. Western Journal of Communication, 64(3), 243 - 267. doi: 10.1080/10570310009374675 Carter, J. D., Farrow, M., Silberstein, R. B., Stough, C., Tucker, A., & Pipings, A. (2003). Assessing inhibitory control: a revised approach to the stop signal task. Journal of Attention Disorders, 6(4), 153 - 161. doi: 10.1177/108705470300600402 Caso, L., Gnisci, A., Vrij, A., & Mann, S. (2005). Processes underlying deception: An empirical analysis of truths and lies when manipulating the stakes. Journal of Interviewing and Offender Profiling, 2(3), 195 - 202. doi: 10.1002/jip.32 Cody, M. J., & O’Hair, H. D. (1983). Nonverbal communication and deception: Difference in deception cues due to gender and communicator dominance. Communication Monographs, 50(3), 175 - 192. doi: 10.1080/03637758309390163 Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155 - 159. doi: 10.1037/0033-2909.112.1.155 Conway, A. R. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engler, R. W. (2005). Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin & Review, 12(5), 769 - 786. doi: 10.3758/BF03196772 Christ, S. E., Van Essen, D. C., Watson, J. M., Brubaker, L. E., & McDermott, K. B. (2009). The contributions of prefrontal cortex and executive control to deception: Evidence from activation likelihood estimate meta-analyses. Cerebral Cortex, 19(7), 1557 - 1566. doi: 10.1093/cercor/bhn189 Christie, R., & Geis, F. L. (Eds.). (1970). Studies in Machiavellianism. New York, NY: Academic Press. Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12(4), 335 - 359. doi: 10.1016/S0022-5371(73)80014-3 Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155. doi: 10.1037/0033-2909.112.1.115

205

References ______

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates Publishers. Cole, T. (2001). Lying to the one you love: The use of deception in romantic relationships. Journal of Social and Personal Relationships, 18(1), 107 - 129. doi: 10.1177/0265407501181005 Dawkins, R. (1976). The selfish gene. New York, NY: Oxford University Press. Debey, E., Verschuere, B., & Crombez, G. (2012). Lying and executive control: An experimental investigation using ego depletion and goal neglect. Acta Psychologica, 140(2), 133 - 141. doi: 10.1016/j.actpsy.2012.03.004 DePaulo, B. M. (1992). Nonverbal behavior and self-presentation. Psychological Bulletin, 111(2), 203 - 243. doi: 10.1037//0033-2909.111.2.203 DePaulo, B. M., & Kashy, D. A. (1998). Everyday lies in close and casual relationships. Journal of Personality and Social Psychology, 74(1), 63 - 79. doi: 10.1037/0022-3514.74.1.63 DePaulo, B. M., Kirkendol, S. E., Tang, J., & O’Brien, T. P. (1988). The motivational impairment effect in the communication of deception: Replications and extensions. Journal of Nonverbal Behavior, 12(3), 177 - 202. doi: 10.1007/BF00987487 DePaulo, B. M, Lanier, K., & Davis, T. (1983). Detecting the deceit of the motivated liar. Journal of Personality and Social Psychology, 45(5), 1096 - 1103. doi: 10.1037/0022-3514.45.5.1096 DePaulo, B. M., Stone, J. I., & Lassiter, G. D. (1985). Telling ingratiating lies: Effects of target sex and target attractiveness on verbal and nonverbal deceptive success. Journal of Personality and Social Psychology, 48(5), 1191 - 1203. doi: 10.1037/0022-3514.48.5.1191 DePaulo, B. M., & Rosenthal, R. (1979). Telling lies. Journal of Personality and Social Psychology, 37(10), 1713 - 1722. doi: 10.1037/0022-3514.37.10.1713 DePaulo, B. M., Kashy, D. A., Kirkendol, S. E., Wyer, M. M., & Epstein, J. A. (1996). Lying in everyday life. Journal of Personality and Social Psychology, 70(5), 979 - 995. doi: 10.1037/0022-3514.70.5.979

206

References ______

DePaulo, B. M., LeMay, C. S., & Epstein, J. A. (1991). Effects of importance of success and expectation for success on effectiveness at deceiving. Personality and Social Psychology Bulletin, 17(1), 14 - 24. doi: 10.1177/0146167291171003 DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 123(1), 74 - 118. doi: 10.1037/0033-2909.129.1.74 Doherty-Sneddon, G., Bruce, V., Bonner, L., Longbotham, S., & Doyle, C. (2002). Development of gaze aversion as disengagement of visual information. Developmental Psychology, 38(3), 438 - 445. doi: 10.1037/0012-1649.38.3.438 Ekman, P., & Friesen, W. (1969). Nonverbal leakage and clues to deception. Psychiatry: Interpersonal and Biological Processes, 32(1), 88 - 106. doi: 10.1521/00332747.1969.11023575 Engle, R. W., Kane, M. J., & Tuholaski, S. W. (1999). Indvidual differneces in working memory and capacity and what they tell us about controlled attention, general fluid intelligence, and functions of the prefrontal cortex. In A. Miyake & P. Shah (Eds.), Models of Working Memory: Mechanisms of Active Maintenance and Executive Control (pp. 102 - 134). Cambridge, UK: Cambridge University Press. Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102(2), 211 - 245. doi: 10.1037/0033-295X.102.2.211 Exline, R. V., Thibaut, J., Hickey, C. B., & Gumpert, P. (1970). Visual interaction in relation to Machiavellianism and an unethical act. In R. Christie & F. Geis (Eds.), Studies in Machiavellianism (pp. 53 – 75). New York, NY: Academic Press. Fillmore, M. T. (2003). Drug abuse as a problem of impaired control: current approaches and findings. Behavioral and Cognitive Neuroscience Reviews, 2(3), 179 - 197. doi: 10.1177/1534582303257007 Finkel, E. J., Campbell, W. K., Brunell, A. B., Dalton, A. N., Scarbeck, S. J., & Chartrand, T. L. (2006). High-maintenance interaction: Inefficient social coordination impairs self-regulation. Journal of Personaility and Social Psychology, 91(3), 456 – 475. doi: 10.1037/0022-3514.91.3.456

207

References ______

Forster, K. I., & Dickinson, R. G. (1976). More on the language-as-fixed-effect fallacy:

Monte Carlo estimates of error rates for F1, F2, F', and min F'. Journal of Verbal Learning and Verbal Behavior, 15(2), 135-142. doi: 10.1016/0022- 5371(76)90014-1 Frank, M., & Ekman, P. (2004). Appearing truthful generalizes across different deception situations. Journal of Personality and Social Psychology, 86(3), 486 - 495. doi: 10.1037/0022-3514.86.3.486 Frank, M., & Feeley, T. (2003). To catch a liar: Challenges for research in lie detection training. Journal of Applied Communication Research, 31(1), 58 - 75. doi: 10.1080/00909880305377 Friese, M., & Hofmann, W. (2008). What would you have as a last supper? Thoughts about death influence evaluation and consumption of food products. Journal of Experimental Social Psychology, 44(5), 1388 - 1394. doi: 10.1016/j.jesp.2008.06.003 Friese, M., Hofmann, W., & Wänke, M. (2008) When impulses take over: Moderated predictive validity of implicit and explicit attitude measures in predicting food choice and consumption behaviour. British Journal of Social Psychology, 47(3), 397 - 419. doi: 10.1348/014466607X241540 Gailliot, M. T., Schmeichel, B. J., Baumesister, R. F. (2006) Self-regulatory processes defend against the threat of death: Effects of self-control depletion and trait self- control on thoughts and fears of dying. Journal of Personality and Social Psychology, 91(1), 49 - 62. doi: 10.1037/0022-3514.91.1.49 Ganis, G., Kosslyn, S. M., Stose, S., Thompson, W. L., & Yurgelun-Todd, D. A. (2003). Neural correlates of different types of deception: An fMRI investigation. Cerebral Cortex, 13(8), 830 - 836. doi: 10.1093/cercor/13.8.830 Geis, F. L., & Moon, T. H. (1981). Machiavellianism and deception. Journal of Personality and Social Psychology, 41(4), 766 - 775. doi: 10.1037/0022- 3514.41.4.766 Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models: Cambridge University Press.

208

References ______

Giammarco, E. A., Atkinson, B., Baughman, H. M., Veselka, L., & Vernon, P. A. (2013). The relation between antisocial personality and the perceived ability to deceive. Personality and Individual Differences, 54(2), 246 - 250. doi: 10.1016/j.paid.2012.09.004 Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in spontaneous speech. New York, NY: Doubleday. Goldstein, H. (2003). Multilevel statistical methods (3rd ed.). London: Edward Arnold Gombos, V. A. (2006). The cognition of deception: the role of executive processes in producing lies. Genetic Social and General Psychology Monographs, 132(3), 197 - 214. doi: 10.3200/MONO.132.3.197-214 Gozna, L. F., Vrij, A., & Bull, R. (2001). The impact of individual differences on perceptions of lying in everyday life and in high stakes situation. Personality and Individual Differences, 31(7), 1203 - 1216. doi: 10.1016/S0191- 8869(00)00219-1 Green, B., Jr., & Tukey, J. (1960). Complex analyses of variance: General problems. Psychometrika, 25(2), 127 - 152. doi: 10.1007/BF02288577 Green D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York, NY: John Wiley & Sons. Hagger, M. S., Wood, C., Stiff, C., & Chatzisarantis, N. L. (2010) Ego depletion and the strength model of self-control: A meta-analysis. Psychological Bulletin, 136(4), 495 - 525. doi: 10.1037/a0019486 Hofmann, W., Gschwendner, T., Castelli, L., & Schmitt, M. (2008) Implicit and explicit attitudes and interracial interaction: The moderating role of situationally available control resources. Group Process and Intergroup Relations, 11(1), 69 - 87. doi: 10.1177/1368430207084847 Hofmann, W., Schmeichel, B. J., & Baddeley, A. D. (2012). Executive functions and self-regulation. Trends in Cognitive Sciences, 16(3), 174 - 180. doi: 10.1016/j.tics.2012.01.006 Inzlicht, M., McKay, L., & Aronson, J. (2006). Stigma as ego depletion: how being the target of prejudice affects self-control. Psychological Science, 17(3), 262 - 269. doi: 10.1111/j.1467-9280.2006.01695.x

209

References ______

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434 - 446. doi: 10.1016/j.jml.2007.11.007 Jaeggi, S. M., Seewer, R., Nirkko, A. C., Eckstein, D., Schroth, G., Groner, R., & Gutbrod, K. (2003). Does excessive memory load attenuate activation in the prefrontal cortex? Load-dependent processing in single and dual tasks: functional magnetic resonance imaging study. Neuroimage, 19(2), 210 - 225. doi: 10.1019/S1053-8119(03)00098-3 Jaeggi, S. M., Studer-Luethi, B., Buschkuehl, M., Su, Y., Jonides, J., Perrig, W. J. (2010). The relationship between n-back performance and matrix reasoning - implications for training and transfer. Intelligence, 38(6), 625 - 635. doi: 10.1016.j.intell.2010.09.001 Johns, M., Inzlicht, M., & Schmader, T. (2008). Stereotype threat and executive resource depletion: Examining the influence of emotion regulation. Journal of Experimental Psychology General, 137(4), 691 - 705. doi: 10.1037/a0013834 Johnson R. Jr., Barnhardt J., & Zhu, J. (2004). The contribution of executive processes to deceptive responding. Neuropsychologia, 42(7), 878 - 901. doi: 10.1016/j.neuropsychologia.2003.12.005 Judd, C. M., Westfall J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54 - 69. doi: 10.1037/a0028347 Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall. Karim, A., Schneider, M., Lotze, M., Veit, R., Sausend, P., Braun, C., & Birbaumer, N. (2010). The truth about lying: Inhibition of the anterior prefrontal cortex improves deceptive behaviour. Cerebral Cortex, 20(1), 205 - 213. doi: 10.1093/cercor/bhp090 Kashy, D., & DePaulo, B. M. (1996). Who lies? Journal of Personality and Social Psychology, 70(5), 1037 - 1051. doi: 10.1037/0022-3514.70.5.1037 Keating, C. F., & Heltman, K. R. (1994). Dominance and deception in children and adults: Are leaders the best misleaders? Personality and Social Psychology Bulletin, 20(3), 312 - 321. doi: 10.1177/0146167294203009

210

References ______

Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge, UK: Cambridge University Press. Kirk, R. E. (1995). Experimental design. Pacific Grove, CA: Brooks/Cole. Klayman, J., & Ha, Y.W. (1987). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, 94(2), 211 - 228. doi: 10.1037/0033- 295X.94.2.211 Knapp, M. L., Hart, R. P., & Dennis, H. S. (1974). An exploration of deception as a communication construct. Human Communication Research, 1(1), 15 - 29. doi: 10.1111/j.1468-2958.1974.tb00250.x Krauss, R. M. (1981). Impression formation, impression management, and nonverbal behaviors. In E. T. Higgins, C. P. Herman, & M. P. Zanna (Eds.), Social cognition: The Ontario Symposium (Vol. 1, pp. 323 - 341). Hillsdale, NJ: Erlbaum. Langleben, D. D., Loughead, J. W., Bilker, W. B., Ruparel, K., Childress, A. R., Busch, S. I., Gur, R. C. (2005). Telling truth from lie in individual subjects with fast event-related fMRI. Human Brain Mapping, 26, 262 - 272. doi: 10.1002/hbm.20191 Leal, S., Vrij, A., Mann, S., & Fisher, R. (2010). Detecting true and false opinions: The devil’s advocate approach as a lie detection aid. Acta Psychologica, 134(3), 323 - 329. doi: 10.1016/j.actpsy.2010.03.005 Levine, T. R. (2010). A few transparent liars: Explaining 54% accuracy in deception detection experiments. In C. Salmon (Ed.), Communication Yearbook 34 (pp. 40 - 61). Thousand Oaks, CA: Sage. Levine, T. R., & McCornack, S. A. (2001). Behacioral adaptation, confidence, and heuristic-based explanations of the probing effect. Human Communication Research, 27(4), 471 - 502. doi: 10.1111/j.1468-2958.2001.tb00790.x Levine, T. R., Park, H. S., & McCornack, S. A. (1999). Accuracy in detecting truths and lies: Documenting the “veracity effect”. Communication Monographs, 66(2), 125 - 144. doi: 10.1080/03637759909376468

211

References ______

Levine, T. R., Serota, K. B., Shulman, H., Clare, D. D., Park, H. S., Shaw, A. S., . . . Lee, J. H. (2011). Sender demeanor: individual differences in sender believability have a powerful impact on deception detection judgments. Human Communication Research, 37(3), 377 - 403. doi: 10.1111/j.1468- 2958.2011.01407.x Logan, G. D. (1994). On the ability to inhibit thought and action: a user’s guide to the stop signal paradigm. In D. Dagenbach & T. H. Carr, Inhibitory processes in attention, memory, and language (pp. 189 – 239). San Diego: Academic Press. Logan, G. D., Schachar, R. J., & Tannock, R. (1997). Impulsivity and inhibitory control. Psychological Science, 8(1), 60 - 64. doi: 10.1111/j.1467-9280.1997.tb00545.x MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integrative review. Psychological Bulletin, 109(2), 163 - 203. doi: 10.1037/0033- 2909.109.2.163 Mameli, F., Mrakic-Sposta, S., Vergan, M., Fumagalli, M., Macis, M., Ferrucci, R., … Priori, A. (2010). Dorsolateral prefrontal cortex specifically processes general – but not personal – knowledge deception: Multiple brain networks for lying. Behavioural Brain Research, 211(2), 164 - 168. doi: 10.1016/j.bbr.2010.03.024 Mann, S. & Vrij, A. (2006). Police officers’ judgements of veracity, tenseness, cognitive load and attempted behavioural control in real life police interviews. Psychology, Crime, & Law, 12(3), 307 - 319. doi: 10.1080/10683160600558444 Mann, S., Vrij, A., & Bull, R. (2002). Suspects, lies and videotape: An analysis of authentic high-stakes liars. Law and Human Behavior, 26(3), 365 - 376. doi: 10.1023/A:1015332606792 Mayer, J. D., & Gaschke, Y. N. (1988). The experience and metaexperience of mood. Journal of Personality and Social Psychology, 55,102 - 111. McCornack, S. A., Morrison, K., Esther Paik, J., Wisner, A. M., & Zhu, X. (2014)/ Information manipulation theory 2: A propositional theory of deceptive discourse production. Journal of Language and Social Psychology, 33(3), 348 - 377, doi: 10.1177/0261927X14534656 Miller, G. R., deTurck, M. A., & Kalbfleisch, P. J. (1983). Self-Monitoring, rehearsal, and deceptive communication. Human Communication Research, 10(1), 97 - 117. doi: 0.1111/j.1468-2958.1983.tb00006.x

212

References ______

Miyake, A., & Friedman, N. P. (2012). The Nature and Organization of Individual Differences in Executive Functions: Four General Conclusions. Current Directions in Psychological Science, 21(1), 8 - 14. doi: 10.1177/0963721411429458 Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D. (2000). The Unity and Diversity of Executive Functions and Their Contributions to Complex "Frontal Lobe" Tasks: A Latent Variable Analysis. Cognitive Psychology, 41(1), 49 - 100. doi: 10.1006/cogp.1999.0734 Morgan, C., LeSage, J., & Kosslyn, S. (2009). Types of deception revealed by individual differences in cognitive abilities. Social Neuroscience, 4(6), 554 - 569. doi: 10.1080/17470910802299987 National Research Council (2003). The polygraph and lie detection. Committee to Review the Scientific Evidence on the Polygraph. Washington, DC: The National Academies Press. Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175 - 220. doi: 10.1037/1089-2680.2.2.175 Norman, D. A., & Bobrow, D. J. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7(1), 44 - 64. doi: 10.1016/0010- 0285(75)90004-3 Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGrawHill O’Hair, H. D., Cody, M., & McLaughlin, M. L. (1981). Prepared lies, spontaneous lies, Machiavellianism and nonverbal communication. Human Communication Research, 7(4), 325 - 339. doi: 10.1111/j.1468-2958.1981.tb00579.x O'Sullivan, M. (2008). Home Runs and Humbugs: Comment on Bond and DePaulo (2008). Psychological Bulletin, 134(4), 493 - 497. doi: 10.1037/0033- 2909.134.4.493 Owen, A. M., McMillan, K. M., Laird, A. R., & Bullmore, E. (2005). N-back working memory paradigm: A meta-analysis of normative functional neuroimaging studies. Human Brain Mapping, 25(1), 46 - 59. doi: 10.1002/hbm.20131 Pandey, W., & Elliott, S. (2010). Suppressor Variables in Social Work Research: Ways to Identify in Multiple Regression Models. Journal of the Society for Social Work and Research, 1(1), 28 - 40. doi: 10.5243/jsswr.2010.2 213

References ______

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5), 411 - 419. Stable URL: http://search.proquest.com/docview/814967099?accountid=12763 Polak, A., & Harris, P. L. (1999). Deception by young children following noncompliance. Developmental Psychology, 35(2), 561 - 568. doi: 10.1037//0012-1649.35.2.561 Porter, S., ten Brink, L., Baker, A., & Wallace, B. (2011). Would I lie to you? “leakage” in deceptive facial expressions relates to psychopathy and emotional intelligence. Personaility and Individual Differences, 51, 133 - 137. doi: 10.1016/j.paid.2011.03.031 Priori, A., Mameli, F., Cogiamanian, F., Marceglia, S., Tiriticco, M., Mrakic-Sposta, S., … & Zago, S. (2008). Lie-specific involvement of dorsolateral prefrontal cortex in deception. Cerebral Cortex, 18(2), 451 - 455. doi: 10.1093/cercor/bhm088 Quinn, G. P., Keough, M. J. (2002). Experimental design and data analysis for biologists. Cambridge, UK: Cambridge University Press Raaijmakers, J. G. W., Schrijnemakers, J. M. C., & Gremmen, F. (1999). How to deal with "the language-as-fixed-effect fallacy": common misconceptions and alternative solutions. Journal of Memory and Language, 41(3), 416 - 426. doi: 10.1006/jmla.1999.2650 Richeson, J. A., & Shelton, J. N. (2003). When prejudice does not pay: Effects of interracial contact on executive function. Psychological Science, 14(3), 287 - 290. doi: 10.1111/1467-9280.03437 Rietveld, T., & Van Hout, R. (2007). Analysis of variance for repeated measures designs with word materials as a nested random or fixed factor. Behavior Research Methods, 39(4), 735 - 747. doi: 10.3758/BF03192964 Riggio, R. E., & Friedman, H. (1983). Individual differences and cues to deception. Journal of Personality and Social Psychology, 45(4), 899 - 915. doi: 10.1037/0022-3514.45.4.899 Riggio, R. E., Salinas, C., & Tucker, J. (1988). Personaility and deception ability. Personality and Individual Differences, 9(1), 189 - 191. doi: 10.1016/0191- 8869(88)90050-5

214

References ______

Riggio, R. E., Tucker, J., & Throckmorton, B. (1988). Social skills and deception ability. Personality and Social Psychology Bulletin, 13(4), 568 - 577. doi: 10.1177/0146167287134013 Riggio, R. E., Tucker, J., & Widaman, K. F. (1987). Verbal and nonverbal cues as mediators of deception ability. Journal of Nonverbal Behavior, 11(3), 126 - 145. doi: 10.1007/BF00990233 Ruxton, G. D., & Beauchamp, G. (2008). Time for some a priori thinking about post hoc testing. Behavioural Ecology, 19(3), 690 - 693. doi: 10.1093/beheco/arn020 Sanchex-Cubillo, I., Perianez, J. A., Adrover-Roig, D., Rodriguez-Sanchez, J. M., Rios- Lago, M., Tirapu, J., & Barcelo, F. (2009). Construct validity of the trail makling test: role of task-switching, working memory, inhibition/interference control, and visuomotor abilites. Journal of the International Neuropsychological Society, 15(3), 438 - 450. doi: 10.1017/S1355617709090626 Santa, J. L., Miller, J. J., & Shaw, M. L. (1979). Using Quasi F to prevent alpha inflation due to stimulus variation. Psychological Bulletin, 86(1), 37-46. doi: 10.1037/0033-2909.86.1.37 Schmeichel, B. J. (2007). Attention control, memory updating, and emotion regulation temporarily reduce the capacity for executive control. Journal of Experimental Psychology General, 136(2), 241 - 255. doi: 10.1037/0096-3445.136.2.241 Serota, K. B., Levine, T. R., & Boster, F. J. (2010). The prevalence of lying in America: Three studies of self-reported lies. Human Communication Research, 36(1), 2 - 25. doi: 10.1111/j.1468-2958.2009.01366.x Šidák, Z. K. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62(318). 626 - 633. doi: 10.1080/01621459.1967.10482935 Siegman, A. W., & Reynolds, M. A. (1983). Self-monitoring and speech in feigned and unfeigned lying. Journal of Personality and Social Psychology, 45(6), 1325 - 1333. doi: 10.1037/0022-3514.45.6.1325 Sodian, B., Taylor, C., Harris, P. L., & Perner, J. (1991). Early deception and the child’s theory of mind: False trails and genuine markers. Child development, 62(3), 468 - 483. doi: 10.1111/j.1467-8624.1991.tb01545.x

215

References ______

Sokal, R. R., Rohlf, F. J. (1995). Biometry (3rd ed.). New York, NY: WH Freeman Spence, S. A., Farrow, T. F. D., Herford, A. E., Wilkinson, I. D., Zheng, Y., & Woodruff, P. W. R. (2001). Behavioural and functional anatomical correlates of deception in humans. Neuroreport: For Rapid Communication of Neuroscience Research, 12(13), 2849 - 2853. doi: 10.1097/00001756-200109170-00019 Spence, S. A., Hunter, M. D., Farrow, T. F., Green, R. D., Leung, D. H., Hughes, C. J., & Ganesan, V. (2004). A cognitive neurobiological account of deception: evidence from functional neuroimaging. Philosophical Transactions of the Royal Society B: Biological Sciences, 359, 1755 - 1762. doi: 10.1098/rstb.2004.1555 Sporer, S. L., & Schwandt, B. (2006). Paraverbal indicators of deception: A meta- analytic synthesis. Applied Cognitive Psychology, 20(4), 421 - 446. doi: 10.1002/acp.1190 Sporer S. L., Schwandt B. (2007). Moderators of nonverbal indicators of deception: a meta-analytic synthesis. Psychology, Public Policy, and Law, 13(1), 1 - 34. doi: 10.1037/1076-8971.13.1.1 Stanisalw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137 - 149. doi: 10.3758/BF03207704 Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164 - 182. doi: 10.1037/1082-989x.9.2.164. Stiff, J. B., & Miller, G. R. (1986). “Come to think of it . . . ”: Interrogative probes, deceptive communication, and deception detection. Human Communication Research, 12(3), 339 - 357. doi: 10.1111/j.1468-2958.1986.tb00081.x Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643 - 662. doi: 10.1037/h0054651 Talwar, V., & Lee, K. (2002). Development of lying to conceal a transgression: Children’s control of expressive behaviour during verbal deception. International Journal of Behavioral Development, 26(5), 436 - 444. doi: 10.1080/01650250143000373.

216

References ______

Tyler, J. M., & Burns, K. C. (2008). After depletion: The replenishment of the self's regulatory resources. Self and Identity, 7, 305 - 321. doi: 10.1080/15298860701799997 Tyler, J. M., Feldman, R. S., & Reichert, A. (2006). The price of deceptive behavior: Disliking and lying to people who lie to us. Journal of Experimental Social Psychology, 42(1), 69 - 77. doi: 10.1016/j.jesp.2005.02.003 Van der Linden, D., Frese, M., & Meijman, T. F. (2003). Mental fatigue and the control of cognitive processes: Effects on perseveration and planning. Acta Psychologica, 113(1), 45 - 65. doi: 10.1016/S0001-6918(02)00150-6 Visu-Petra, G., Miclea, M., & Visu-Petra, L. (2012). RT-based detection of concealed information in relation to individual differences in executive functioning. Applied Cognitive Psychology, 26(3), 342 - 351. doi: 10.1002/acp.1827 Vohs, K. D., & Heatherton, T. F. (2000) Self-regulatory failure: A resource-depletion approach. Psychological Science, 11(3), 249 - 254. doi: 10.1111/1467- 9280.00250 Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities (2nd ed.). Chichester, UK: John Wiley & Sons. Vrij, A., Edward, K., & Bull, R. (2001). Stereotypical verbal and nonverbal responses while deceiving others. Personality and Social Psychology Bulletin, 27(7), 899 - 909. doi: 10.1177/0146167201277012 Vrij, A., Fisher, R., Mann, S., & Leal, S. (2006). Detecting deception by manipulating cognitive load. Trends in Cognitive Science, 10(4), 141 - 142. doi: 10.1016/j.tics.2006.02.003 Vrij, A., Granhag, P. A., & Mann, S. (2010). Good liars. Journal of Psychiatry and Law, 38(1), 77 - 98. doi: 10.1177/009318531003800105 Vrij A., Leal S., Granhag P. A., Mann S., Fisher R. P., Hillman J., & Sperry, K. (2009). Outsmarting the liars: The benefits of asking unanticipated questions. Law and Human Behavior, 33(2), 159 - 166. doi: 10.1007/s10979- 008-9143-y Vrij, A., Leal, S., Mann, S., & Fisher, R. (2012). Imposing cognitive load to elicit cues to deceit: inducing the reverse order technique naturally. Psychology, Crime and Law, 18(6), 579 - 594. doi: 10.1080/1068316X.2010.515987 217

References ______

Vrij, A., & Mann, S. (2006). Criteria-Based Content Analysis: An empirical test of its underlying processes. Psychology, Crime, & Law, 12(4), 337 - 349. doi: 10.1080/10683160500129007 Vrij, A., Mann, S., & Fisher, R. (2006). Information-gathering vs accusatory interview style: Individual differences in respondents’ experiences. Personality and Individual Differences, 41(4), 589 - 599. doi: 10.1016/j.paid.2006.02.014 Vrij, A., Mann, S. A., Fisher, R. P., Leal, S. Milne, R., & Bull, R. (2008). Increasing congitive load to facilitate lie detection: The benefit of recalling an event in reverse order. Law and Human Behavior, 32(3), 253 - 265. doi: 10.1007/s10979- 007-9103-y Vrij A., Mann S., Leal S., Fisher R. (2010). “Look into my eyes”: Can an instruction to maintain eye contract facilitate lie detection? Psychology, Crime and Law, 16(4), 327 - 348. doi: 10.1080/10683160902740633 Vrij, A., Semin, G. R., & Bull, R. (1996). Insight in behavior displayed during deception. Human Communication Research, 22(4), 544 - 562. doi: 10.1111/j.1468-2958.1996.tb00378.x Walczyk, J. J., Harris, L. L., Duck, T. K., & Mulay, D. D. (2014). A social-cognitive framework for understanding serious lies: Activation-decision-construction- action theory. New Ideas in Psychology, 34, 22 - 36. doi: 10.1019/j.newideapsych.2014.03.001 Walczyk, J. J., Roper, K. S., Seemann, E., & Humphrey, A. M. (2003). Cognitive mechanisms underlying lying to questions: response time as a cue to deception. Applied Cognitive Psychology, 17(7), 755 - 774. doi: 10.1002/acp.914 Wallbot, H. G., & Scherer, K. R. (1991). Stress specifics: Differential effects of coping, gender, and type of stressor on automatic arousal, facial expressions, and subjective feeling. Journal of Personality and Social Psychology, 61(1), 147 - 156. doi: 10.1037/0022-3514.61.1.147 Ward, A., & Mann, T. (2000). Don’t mind if I do: disinhibited eating under cognitive load. Journal of Personality and Social Psychology, 78(4), 753 - 763. Doi: 10.1037/0022-3514.78.4.753 Wechsler, D. (1999). Wechsler Abbreviated Scale of Intelligence. San Antonio, TX: The Psychological Corporation.

218

References ______

Wickens, T. D., & Keppel, G. (1983). On the choice of design and of test statistic in the analysis of experiments with sampled materials. Journal of Verbal Learning and Verbal Behavior, 22(3), 296 - 309. doi: 10.1016/S0022-5371(83)90208-6 Wright, R. (1994). The moral animal: Evolutionary psychology and everyday life. New York, NY: Pantheon Books. Wright, G., Berry, C., & Bird, G. (2012). Deceptively simple … The “deception- general” ability and the need to put the liar under the spotlight. Frontiers in Neuroscience, 7, doi: 10.3389/fnins.2013.00152 Wright Whelan, C., Wagstaf, G. F., & Wheatcroft, J. M. (2014). High-stakes lies: Verbal and nonverbal cues to deception in public appeals for help with missing or murdered relatives. Psychiatry, Psychology and Law, 21(4), 523 - 537. doi: 10.1080/13218719.2013.839931 Zuckerman, M., DePaulo, B. M., & Rosenthal, R. (1981). Verbal and nonverbal communication of deception. In L. Berkowitz (Ed.), Advances in experimental social psychology, volume 14 (pp. 1 - 59). New York, NY: Academic Press. doi: 10.1016/S0065-2601(08)60369-X

219

Appendices ______

APPENDICIES Appendix A Study 1 - Participant-Sender Opinion Survey For each of the following statements, please circle the response that best reflects your opinion. Keep in mind that the next stage of the study requires you to provide a verbal response and short justification for some of your answers.

1. I believe that the carbon tax should be repealed Strongly Strongly Disagree Neutral Agree Disagree Agree

2. I believe that fewer boatpeople should be accepted into Australia as refugees Strongly Strongly Disagree Neutral Agree Disagree Agree

3. I believe that the rich should pay more taxes Strongly Strongly Disagree Neutral Agree Disagree Agree

4. I believe that too much money is spent on welfare Strongly Strongly Disagree Neutral Agree Disagree Agree

5. I believe that Australia has a moral obligation to seek out terrorism no matter where it occurs Strongly Strongly Disagree Neutral Agree Disagree Agree

6. I believe that Socialism is a better economic system than Capitalism Strongly Strongly Disagree Neutral Agree Disagree Agree

220

Appendices ______

7. I believe that Australians live in a safer community than they did 20 years ago Strongly Strongly Disagree Neutral Agree Disagree Agree

8. I believe that Julia Gillard is a good prime minister Strongly Strongly Disagree Neutral Agree Disagree Agree

9. I believe that modern music negatively influences children Strongly Strongly Disagree Neutral Agree Disagree Agree

10. I believe that marijuana should be legalized Strongly Strongly Disagree Neutral Agree Disagree Agree

11. I believe that torture is sometimes justified Strongly Strongly Disagree Neutral Agree Disagree Agree

12. I believe that video games are bad for children Strongly Strongly Disagree Neutral Agree Disagree Agree

13. I believe that Australia should build more nuclear power plants Strongly Strongly Disagree Neutral Agree Disagree Agree

14. I believe that human genetic engineering is unethical Strongly Strongly Disagree Neutral Agree Disagree Agree

221

Appendices ______

15. I believe that police should have access to our internet history Strongly Strongly Disagree Neutral Agree Disagree Agree

16. I believe that airport security regulations are too strict Strongly Strongly Disagree Neutral Agree Disagree Agree

17. I believe that paparazzi laws are too lenient Strongly Strongly Disagree Neutral Agree Disagree Agree

18. I believe that governments should mandate health insurance Strongly Strongly Disagree Neutral Agree Disagree Agree

19. I believe that there should be a tax on high fat foods Strongly Strongly Disagree Neutral Agree Disagree Agree

20. I believe that all forms of advertising tobacco products be banned Strongly Strongly Disagree Neutral Agree Disagree Agree

222

Appendices ______

Appendix B Study 1 - Self-Report Scales

223

Appendices ______

Appendix C Study 1 - Participant-Receiver Opinion Survey For each of the following statements, please click on the response that you think best reflects the majority of Australian University Students opinions. That is, if you believe that the majority of Australian University students would agree with the particular statement then please click on the ‘agree’ option. If you believe that the majority of Australian University students would disagree with the particular statement then please click on the ‘disagree’ option.

224

Appendices ______

225

Appendices ______

226

Appendices ______

227

Appendices ______

Appendix D Study 2 - Participant-Sender Opinion Survey For each of the following statements, please circle the response that best reflects your opinion. Keep in mind that the next stage of the study requires you to provide a verbal response and short justification for some of your answers.

1. I believe that the carbon tax should be repealed Strongly Strongly Disagree Neutral Agree Disagree Agree

2. I believe that too much money is spent on welfare Strongly Strongly Disagree Neutral Agree Disagree Agree

3. I believe that Australia has a moral obligation to seek out terrorism no matter where it occurs Strongly Strongly Disagree Neutral Agree Disagree Agree

4. I believe that Julia Gillard is a good prime minister Strongly Strongly Disagree Neutral Agree Disagree Agree

5. I believe that modern music negatively influences children Strongly Strongly Disagree Neutral Agree Disagree Agree

6. I believe that marijuana should be legalized Strongly Strongly Disagree Neutral Agree Disagree Agree

228

Appendices ______

7. I believe that video games are bad for children Strongly Strongly Disagree Neutral Agree Disagree Agree

8. I believe that human genetic engineering is unethical Strongly Strongly Disagree Neutral Agree Disagree Agree

9. I believe that airport security regulations are too strict Strongly Strongly Disagree Neutral Agree Disagree Agree

10. I believe that paparazzi laws are too lenient Strongly Strongly Disagree Neutral Agree Disagree Agree

11. I believe that governments should mandate health insurance Strongly Strongly Disagree Neutral Agree Disagree Agree

12. I believe that there should be a tax on high fat foods Strongly Strongly Disagree Neutral Agree Disagree Agree

13. I believe that all forms of advertising tobacco products be banned Strongly Strongly Disagree Neutral Agree Disagree Agree

14. I believe that stem cell research should be abandoned Strongly Strongly Disagree Neutral Agree Disagree Agree

229

Appendices ______

15. I believe that a ‘flat tax’ system is the fairest system Strongly Strongly Disagree Neutral Agree Disagree Agree

16. I believe that humans are inherently good Strongly Strongly Disagree Neutral Agree Disagree Agree

17. I believe that ‘free will’ exists Strongly Strongly Disagree Neutral Agree Disagree Agree

18. I believe that parents are responsible for their child’s actions Strongly Strongly Disagree Neutral Agree Disagree Agree

19. I believe that individuals should have the right to bear arms Strongly Strongly Disagree Neutral Agree Disagree Agree

20. I believe that online education is the way of the future

Strongly Strongly Disagree Neutral Agree Disagree Agree

230

Appendices ______

Appendix E Study 2 - Supplementary Self-Report Scales

231

Appendices ______

232

Appendices ______

Appendix F Study 3 – Brief Mood Introspection Scale INSTURCTIONS: Please click on the response that indicates how well each adjective or phrase describes your present mood.

233

Appendices ______

234

Appendices ______

Appendix G Methods for Monte Carlo Simulations Each decision-level dataset was created by randomly sampling 200 participants from two hypothetical populations; a population of senders and a population of judges. These participants were then randomly assigned to the levels of the independent factors. Each population of potential participants was assigned a normal distribution of accuracy scores (long-run averages of the percentage of messages correctly detected) and a normal distribution of bias scores (long-run averages of the percentage of messages judged as truthful). Means and standard deviations were set to the meta-analytic estimates reported in Bond and DePaulo (2008). Participants, along with their accuracy and bias scores, were then organized in a rectangular matrix; with each sender assigned a unique column and each judge a unique row. An example dataset is presented in Table 14. Table 14 Example Decision-Level Dataset used in Monte Carlo Simulations Old Interview Protocol New Interview Protocol Senders Senders Truth Lie Truth Lie A B C D E F G H Receivers 49.99 50.73 51.25 52.21 Judges 54.48 52.87 49.37 58.65 61.01 50.29 69.20 58.89 50.12 51.41 51.12 68.59 a e 54.36 1 1 0 1 53.74 1 1 0 0 50.49 56.33 b f 54.48 1 0 0 1 54.88 1 0 1 1 51.38 55.45 c g 54.51 1 0 1 0 55.12 0 1 1 1 47.97 58.83 d h 52.38 1 0 0 1 54.35 0 1 0 0 50.41 50.05 Note: Scores associated with senders/judges represent long-run averages. The first score represents their assigned long-run accuracy score (the percentage of correct outcomes on a test of infinite length), whereas the second score represents their assigned bias score (the percentage of messages judged as truthful on a test of infinite length).

The outcome for each cell in the data matrix was determined probabilistically, with the likelihood of a correct outcome dependant on two scores; the cell accuracy score (the average of the respective sender/judge accuracy scores) and the cell bias

235

Appendices ______score (the average of the respective sender/judge bias scores). It is helpful to consider each cells accuracy score as the baseline likelihood of a correct outcome in the cell and each cells bias score as providing an adjustment to this baseline likelihood based on veracity condition. The calculation of each score will be explained in turn. To calculate each cell accuracy score the respective sender/judge accuracy scores associated with each cell were averaged. Before this was done, however, it was first necessary to modify the individual sender/judge accuracy scores. To demonstrate why this was necessary, consider a judge with an accuracy score of 49.99%. To obtain the desired level of accuracy in the long run, the average likelihood of a correct outcome for the cells in the judge’s respective row should be 49.99%. To calculate this average likelihood the judge’s score is averaged with the mean sender accuracy (54%). This process causes the average likelihood of each cell in the judge’s respective row to be regressed towards the mean sender accuracy. To correct this, individual judge accuracy scores were multiplied by 2 then the mean sender accuracy subtracted. This modification produces individual judge accuracy scores that, when combined with the sender scores, yield suitable average likelihoods. For example, the average likelihood of a correct outcome for a judge with an accuracy score of 49.99% would be calculated as ((49.99 ∗ 2 − 54) + 54)/2 = 49.99. Using the modified scores, on a test of infinite length, the average likelihood of a correct outcome for the cells in a judge’s respective row equals that participant’s original accuracy score. The same logic applies to the sender accuracy scores, thus the procedure was applied to both judge and sender scores. Once the individual sender/judge accuracy scores had been modified, each cells accuracy score was calculated by averaging the respective sender/judge modified accuracy scores. The cell accuracy scores reflect the baseline likelihood of a correct outcome for each cell. To illustrate, sender A’s individual accuracy score is 49.99% while judge a’s is 54.36%, thus cell Aa’s accuracy score is ((49.99 ∗ 2 − 54) + (54.36 ∗ 2 − 54))/2 = 50.35. This score means that when sender A’s message is evaluated by judge a, there is a 50.35% chance that the outcome will be correct. After calculating cell accuracy scores, cell bias scores were calculated (using modified individual sender/judge bias scores) and used to adjust the cell accuracy scores. As was noted earlier, bias and accuracy interact and, when veracity is manipulated between-subjects, outcomes may be largely dependant on which sender gets allocated to which level of veracity. On average, a participant with no bias will 236

Appendices ______have 50% of the messages in their respective row/column rated as truthful, thus they will have no advantage or disadvantage with regard to their truth or lie accuracy. A participant with a truth bias, on the other hand, has a certain advantage with regard to their truth accuracy. For instance, a participant who has 67.65% of the messages in their respective row/column rated as truthful will necessarily have at least 17.65% of the truthful messages correctly identified. This advantage, however, comes at the cost of reduced accuracy for lies, with at best 82.35% of lies being correctly identified. Probabilistically speaking, the participant has a 17.65% increased chance having any given truthful message correctly identified and a 17.65% decreased chance of having any given lie correctly identified. To calculate each cells adjustment, 50% (no bias) was subtracted from each cells bias score. If the cell belonged to the truth condition, the adjustment score was added to the cell accuracy score, whereas if the cell belonged to the lie condition the adjustment score was subtracted from the cell accuracy score. For example, if sender A had a bias score of 61.01 and judge a had a bias score of 50.49, then the adjustment score for cell Aa would be calculated as ((61.01 ∗ 2 − 55.23) + (50.49 ∗ 2 − 55.23))/2 = 56.27 − 50 = 6.27. As this cell belongs to the truth condition, the adjustment value is added to the cell accuracy score, giving an adjusted likelihood of a correct outcome of 50.35 + 6.27 = 56.62. This score is greater than the unadjusted cell accuracy score primarily because sender A is the type of person that tends to be rated as truthful. The final score for each cell gives the likelihood of a correct outcome. To probabilistically assign actual outcomes each cell was assigned a random score ranging from 1 to 100 (sampled with replacement from a uniform distribution). If the final score was greater than the random score the cell outcome was correct (and given a 1), otherwise it was incorrect (and given a 0). As the random scores were sampled from a uniform distribution, values less than the final cell score would be expected to occur exactly as often as the final cell score in the long run.

237