Institute of Psychology and Education

How to (Not) Measure Self-Favoring Response Bias – An Examination of Impression Management and Overclaiming

DISSERTATION

for the degree of

DOCTOR of PHILOSOPHY

(Dr. phil.)

presented to the Faculty of Engineering, Computer and Psychology at Ulm University

by

Sascha Müller

from Fritzlar

2019

Dean: Prof. Dr. Maurits Ortmanns

1st Reviewer/Advisor: Prof. Dr. Morten Moshagen

2nd Reviewer: Prof. Dr. Johannes Zimmermann

3rd Reviewer: Prof. Dr. Oliver Wilhelm

Date of Oral Exam: 24.07.2019

Page 2 of 69

Contents

Zusammenfassung ...... 4

Summary ...... 6

1 Introduction ...... 8

2 Scales ...... 11

3 Overclaiming ...... 14

4 Aims of the Present Studies ...... 17

5 Studies ...... 18

5.1 Study 1: Controlling for Response Bias in Self-Ratings of Personality – A Comparison

of Impression Management and Overclaiming ...... 18

5.2 Study 2: True Virtue, Self-Presentation, or Both? – A Behavioral Test of Impression

Management and Overclaiming ...... 23

5.3 Study 3: Overclaiming Shares Processes With the Hindsight Bias ...... 27

6 General Discussion ...... 31

6.1 Impression Management, or How to Assess Virtuousness and Response Bias Using a

Single Scale ...... 32

6.2 Overclaiming, or How to Assess Some Form of Openness and Biased Cognition ...... 35

6.3 A Status Update, or How to (Not) Deal With Self-Favoring Response Bias ...... 39

7 Conclusion ...... 42

References ...... 43

Appendices ...... 69

Page 3 of 69

Zusammenfassung

Selbstberichtsmaße bieten einen schnellen und unkomplizierten Einblick in die

Charakteristika einer Person. Daher werden Selbstberichte in Fragebögen nicht nur zu

Forschungszwecken genutzt, sondern sie fließen auch in Situationen des alltäglichen Lebens, wie beispielsweise bei Persönlichkeitstests in Bewerbungsprozessen, ein. Eine potentielle Bedrohung für die Validität von Selbstauskünften entsteht allerdings, wenn nicht jede Person ehrliche

Angaben macht. Eine Variante von Antwortverzerrungen ist die Tendenz, sozial erwünschte

Antworten zu geben, um sich selbst positiver darzustellen, als man ist. Unter der Annahme, dass

Personen sich hinsichtlich dieser Tendenz interindividuell unterscheiden, wurden verschiedene

Ansätze zur Kontrolle von sozial erwünschtem Antwortverhalten vorgeschlagen.

Sogenannte Lügenskalen wie die Impression Management-Skala (Paulhus, 1991) sind so konzipiert, dass ihre Items sozial erwünschte oder unerwünschte Merkmale und

Verhaltensweisen abdecken. Hohe Werte auf einer solchen Skala sollen demnach auf eine unrealistisch positive Selbstdarstellung hindeuten. Die Overclaiming-Technik (Paulhus, Harms,

Bruce, & Lysy, 2003) funktioniert hingegen ähnlich wie ein Allgemeinwissenstest, indem

Personen angeben, ob sie bestimmte Begriffe aus verschiedenen Wissensbereichen kennen.

Einige dieser Begriffe existieren jedoch nicht, woraus folgt, dass eine behauptete Kenntnis dieser als objektive Wissensübertreibung interpretiert werden kann. Bisherige Forschungsergebnisse zur

Validität beider Ansätze sind allerdings unbefriedigend. Aus diesem Grund wurde in der vorliegenden Dissertation mittels drei Studien untersucht, ob sich Impression Management und

Overclaiming zur Messung von Selbstdarstellungstendenzen eignen. Überdies wurden mögliche

Alternativerklärungen dafür betrachtet, wie hohe Werte auf beiden Maßen zustande kommen und interpretiert werden können, wenn nicht als Selbstdarstellung.

Page 4 of 69

In einer ersten Studie konnte gezeigt werden, dass weder Impression Management noch

Overclaiming geeignet sind, um Diskrepanzen zwischen Selbst- und Fremdeinschätzungen der

Persönlichkeit zu erklären. Entgegen der Idee einer Lügenskala wurden Personen mit hohen

Werten auf der Impression Management-Skala nicht nur von sich selbst, sondern auch von ihnen gut bekannten Personen eher als ehrlich und bescheiden eingeschätzt. In einer zweiten Studie waren hohe Werte auf der Impression Management-Skala mit ehrlicherem Verhalten in einem

Cheatingparadigma assoziiert. Gleichzeitig stellten sich Personen mit hohen Werten aber auch sozial erwünschter dar, indem sie in einem hypothetischen Szenario behaupteten, sie würden sich prosozialer verhalten, als sie es später tatsächlich taten. Diese Ergebnisse deuten darauf hin, dass

Impression Management mit der Persönlichkeitsdimension Ehrlichkeit-Bescheidenheit (z.B.

Ashton & Lee, 2009) konfundiert und dennoch sensitiv gegenüber Selbstdarstellung ist. Da demnach weder Selbstdarstellungstendenzen noch Ausprägungen von Ehrlichkeit eindeutig gemessen werden, kann die Skala für sich genommen in der derzeitigen Form nicht zum Einsatz als Validitätsmaß empfohlen werden.

Overclaiming hingegen zeigte auch in der zweiten Studie keinen Zusammenhang mit

Selbstdarstellung. Infolgedessen wurde mittels einer dritten Studie der Frage nachgegangen, ob der Effekt stattdessen eher auf verzerrte kognitive Prozesse zurückgeht. Tatsächlich hing die

Tendenz, Kenntnis ausgedachter Begriffe für sich zu beanspruchen, mit dem sogenannten

Hindsight Bias (z.B. Fischhoff, 1975) und damit assoziierten verzerrten Gedächtnisprozessen zusammen. Dies legt nahe, dass Overclaiming nicht zwingend als absichtliche

Wissensübertreibung, sondern auch als unbewusste Wissensüberschätzung interpretiert werden kann. Insgesamt konnte keine der beiden Methoden zeigen, dass sie valide Maße von

Selbstdarstellungstendenzen darstellen. Mögliche Schlussfolgerungen und nächste Schritte werden diskutiert.

Page 5 of 69

Summary

Self-report measures conveniently provide insights into a person’s characteristics. Hence, self-reported answers in questionnaires are not only used for research purposes, but they are also part of everyday situations such as personality tests in job application contexts. However, a potential threat to the validity of self-reports is that it can be assumed that not everyone responds honestly to questionnaire items. One variant of misrepresentation is the tendency to present oneself overly favorably by providing socially desirable answers. Under the assumption that individuals differ with respect to this tendency, various approaches have been proposed to assess and control for socially desirable responding.

So-called lie scales such as the impression management scale (Paulhus, 1991) intentionally contain socially desirable or undesirable characteristics and behaviors, so that high values on these scales are supposed to signal an unrealistically positive self-presentation. The overclaiming technique (Paulhus, Harms, Bruce, & Lysy, 2003), on the other hand, is more akin to a general knowledge test in that people indicate whether they know certain terms from different areas of knowledge. Crucially, some of the presented terms do not exist, which means that claiming knowledge of them can be interpreted as an objective exaggeration of one’s knowledge. However, previous research on the validity of both approaches yielded unsatisfactory and inconsistent results. For this reason, the present dissertation presents three studies examining whether impression management and overclaiming are suitable measures of self-favoring response biases. Furthermore, possible alternative explanations of high values on both measures were considered.

In a first study, impression management and overclaiming did not explain discrepancies between self- and other-reports of personality. Contrary to the idea of a lie scale, people who received high scores on the impression management scale were rated as honest and humble not

Page 6 of 69

only by themselves but also by well-acquainted others. In a second study, this result was corroborated by the finding that individuals with high scores on the scale tended to behave more honestly in a cheating paradigm. At the same time, however, these individuals also presented themselves overly favorably by claiming that they would act more prosocially in a hypothetical scenario than they actually did later on. These results indicate that impression management is confounded with the personality dimension of honesty-humility (e.g., Ashton & Lee, 2009) but also sensitive to self-presentation. Thus, since neither trait manifestations of honesty nor self- presentation tendencies are measured unambiguously, the scale in its current form cannot be recommended for use as a validity measure.

Overclaiming, on the other hand, also failed to account for self-presentation in the second study. Hence, a third study investigated whether the effect can rather be attributed to biased cognitive processing. Indeed, the tendency to claim knowledge of bogus terms was related to the so-called hindsight bias (e.g., Fischhoff, 1975) and associated biased memory processes. This suggests that overclaiming cannot necessarily be interpreted as deliberate knowledge exaggeration but as an unconscious overestimation of one’s knowledge. Overall, neither of the two methods emerged as valid measures of self-favoring response biases. Possible conclusions and next steps are discussed.

Page 7 of 69

1 Introduction

In psychological assessment, researchers and practitioners typically rely on self-reported questionnaires as the main tool to obtain information about an individual (e.g., Baumeister, Vohs,

& Funder, 2007). This is due to the circumstance that the self-report provides a convenient and economical measurement solution compared to alternatives such as interviews or observational methods. Self-report measures are based on a range of assumptions, with one being that respondents faithfully adhere to the instructions given and answer items in a way which offers insights into their individual dispositions. However, the validity of self-reported answers is thus easily diminished if individuals conceal their “true” nature by deliberately distorting their responses (e.g., Paulhus & Vazire, 2007).

One such response distortion is the tendency to present oneself in a more favorable light than warranted and in accordance with social norms, which has received several labels such as socially desirable responding, self-presentation, self-enhancement, or self-favoring response bias

(e.g., Edwards, 1957; Hoorens, 1995; Paulhus, 1991; Schlenker, 1975). Instead of merely adding random noise to a measurement, individuals exhibiting this tendency are assumed to systematically bias their scores on a measure in the direction they deem more favorable. This specific response style is regarded as a rather trait-like characteristic that is relatively stable across time and situations (e.g., Crowne & Marlowe, 1964; Jackson & Messick, 1958). The concern that self-report instruments may be affected by individuals providing overly positive self- reports has been raised from the start of psychological assessment (e.g., Allport, 1928;

Bernreuter, 1933) and continues to be a matter of discussion (e.g., Goffin & Boyd, 2009; Hough,

1998). The potential threat theoretically pertains to every assessment. In psychological research, it is a general prerequisite that scores on a measure reflect individual differences in the respective characteristic as accurately as possible, regardless of whether the measurement was obtained to

Page 8 of 69

construct, validate, or simply apply an instrument. If scores on a measure are distorted by some respondents, however, interpretations based on these are biased as well. Individual characteristics are potentially over- or underestimated and distributions of scores on a measure are distorted. A major problem is that associations between measures are biased as well, which can result in inaccurate conclusions concerning the validity of measures. Other significant consequences of distorted self-reports arise in situations where important decisions are (at least partly) based on the assessment. In these situations, individuals might be particularly motivated to exaggerate their qualities and hide their flaws. For instance, using personality assessments has become a relevant factor in job recruiting (Hough & Oswald, 2008; Rothstein & Goffin, 2006). Even though findings on the influence of applicants presenting themselves overly positively on the validity of the hiring decisions themselves are inconsistent (e.g., Christiansen, Goffin, Johnston, &

Rothstein, 1994; Donovan, Dwight, & Schneider, 2014), it might at least give some individuals an unfair advantage over honest applicants if it affects individual hiring decisions in the first place.

Like any potential threat to the validity of an assessment, the existence of self-favoring response bias has initiated the search for a remedy. In general, there are multiple techniques that are designed to elicit more honest responding. For example, putting pressure on respondents to reduce bias is the idea behind approaches such as psychophysiological lie detection via polygraphs measuring pulse, blood pressure, and respiration (Iacono, 2000, 2008), or the bogus pipeline procedure which involves a fake polygraph (Jones & Sigall, 1971; Roese & Jamieson,

1993). However, these approaches require a specific apparatus and individual testing. A different strategy is to reduce social desirability by means of using indirect questioning and increasing the anonymity of the assessment. For example, the randomized response technique (RRT; Warner,

1965) adds random noise to individuals’ responses such that it is not possible to determine

Page 9 of 69

individuals’ actual characteristics through their responses in the assessment. Respondents are asked to answer either a sensitive question (e.g., “Have you ever used cocaine?”) or the negation of this question (“Have you never used cocaine?”), depending on the outcome of a randomization process involving known probabilities (e.g., respondents’ month of birth). Extensions of this technique (e.g., Moshagen, Musch, & Erdfelder, 2012) and similar variants (e.g., Yu, Tian, &

Tang, 2008) aim toward providing undistorted prevalence estimates of sensitive issues even in case of dishonest responding and have been shown to reduce socially desirable responding in comparison to direct questioning (e.g., Hoffmann, Diedenhofen, Verschuere, & Musch, 2015;

Moshagen, Hilbig, Erdfelder, & Moritz, 2014; Thielmann, Heck, & Hilbig, 2016). However,

RRT and related strategies require a specific measurement technique and, most importantly, do not allow for the detection of self-favoring response bias on the individual level due to their inherent focus on anonymity and confidentiality.

In most situations, though, the individual level is of major interest. Now, if everyone were to engage in self-enhancement and simply inflate their positive attributes, it would generally not be as problematic since rank-orders were exactly the same as if everyone were answering truthfully. As opposed to this scenario, however, the general notion of self-favoring response bias is that some self-reports are systematically distorted while others are not. Hence, the idea is that there is variance, that is, individual differences in this response style between individuals (e.g.,

Crowne & Marlowe, 1964; Edwards, 1990; Paulhus, 2002). This conception implicates that the individual degree of response bias can be measured, used to flag potentially inaccurate self- reports, or, ultimately, be statistically controlled for, revealing the less biased results. As a consequence, a long research tradition has established with researchers proposing diverse approaches aiming at measuring the individual tendency to provide socially desirable responses.

The most frequent approach is to use dedicated self-report scales intended to assess response

Page 10 of 69

distortion, which have also been termed “lie scales” (e.g., Eysenck, Eysenck, & Barrett, 1985;

Gibson, 1962; Meehl & Hathaway, 1946) under the assumption that self-favoring response bias is a form of dishonest responding.

2 Lie Scales

First introduced by Hartshorne and May (1928), the basic idea of lie scales is to present individuals with self-report items that are prone to elicit misrepresentation. Therefore, the items typically represent socially desirable but uncommon virtues or socially undesirable but common flaws (e.g., Paulhus, 1984). Thus, individuals intending to give a favorable impression by claiming the former and denying the latter are thought to present themselves in an unrealistically positive manner (e.g., “I never swear”; Paulhus, 1991). Although a plethora of measures has been introduced (e.g., Crowne & Marlowe, 1960; Edwards, 1957; Hathaway & McKinley, 1951;

Paulhus, 1991; Sackeim & Gur, 1978; Wiggins, 1959), they all more or less adhere to this core principle and are thus similar to each other with regard to their item content. Of these measures, the Impression Management (IM) scale of the Balanced Inventory of Desirable Responding

(BIDR; Paulhus, 1991) still receives much attention to date and is one of the standard tools to measure socially desirable responding. The IM scale is intended to capture deliberate self- presentation characterized by overly favorable self-descriptions that are given to impress others

(Paulhus, 1984, 1991), which, however, “should appear even when there is no audience”

(Paulhus, 2002, p. 62).

Given the underlying assumptions of all lie scales, high scores on a respective measure should reflect a response style characterized by providing overly positive self-reports.

Theoretically, the scores on such a measure could then be used to statistically control for biased responding in other self-reported characteristics of interest. Unfortunately, attempts to

Page 11 of 69

demonstrate the utility of these purported validity measures did not yield satisfactory results (e.g.,

Borkenau & Ostendorf, 1992; Borkenau & Zaltauskas, 2009; Konstabel, Aavik, & Allik, 2006; Li

& Bagger, 2006; Piedmont, McCrae, Riemann, & Angleitner, 2000). Apart from these reports, a major reason of lie scales being criticized is the consistent finding that they exhibit substantial associations with personality traits such as emotional stability, conscientiousness, and agreeableness (e.g., Connelly & Chang, 2016; Li & Bagger, 2006; McCrae & Costa, 1983; Ones,

Viswesvaran, & Reiss, 1996; Paulhus, 2002). Positive correlations alone could also be explained as being a result of socially desirable responding which causes elevated scores on desirable dispositions. This would rather indicate that lie scales work as intended. However, the circumstance that knowledgeable others provide similar ratings as the target on personality measures (e.g., Kurtz, Tarquini, & Iobst, 2008; McCrae & Costa, 1983) as well as on the lie scales themselves (e.g., Block, 1965; Konstabel et al., 2006) poses the question whether individuals with high scores merely claim a virtuous character or truly possess it.

Given that lie scales show internal consistency, consistency across situations, and stability over time, Uziel (2010, 2014) argued that they do not reflect self-favoring response bias but a specific personality trait. In an extensive review, Uziel (2010) examined gathered associations between lie scales and personality dimensions, other-reports, and behavioral outcomes. It was concluded that individuals with high scores on these measures are, for example, agreeable, rather introverted, self-controlled in their social behavior, and experience less negative affect. This profile was interpreted as reflecting interpersonally oriented self-control, such that individuals with higher scores on lie scales are successful self-regulators. Testing this account, Uziel (2014) found high self-other agreement for IM scores, and self-control was the main correlate of IM using self-reports and reports from romantic partners. In addition, higher IM scores were

Page 12 of 69

associated with a rather humble self-portrayal as visible in larger discrepancies between the actual and the ideal self.

As opposed to this focus on self-control, a different interpretation of lie scales is more in line with the latter finding by Uziel (2014) that individuals with high scores on these scales might be rather humble. De Vries, Zettler, and Hilbig (2014) noted that many IM items refer to integrity or dishonesty and compared the content of the scale to that of the honesty-humility (HH) dimension of the HEXACO model of personality (e.g., Ashton & Lee, 2009; Lee & Ashton,

2004). Indeed, some HH items reveal a striking similarity to lie scales such as IM as they also involve highly desirable or undesirable behaviors (e.g., using instrumental flattery to gain personal advantage). The apparent difference in the intention to use these items for assessing variation in true virtuousness rather than its mere assertion. In principle, the HH scale should also be prone to overly favorable self-reports due its socially desirable content spanning facets of sincerity, fairness, modesty, and greed avoidance (Lee & Ashton, 2004). However, HH appears to be a valid indicator of true virtues as it predicts honest, fair, altruistic, and ethical behavior

(Ashton, Lee, & De Vries, 2014; Ashton & Lee, 2008; Heck, Thielmann, Moshagen, & Hilbig,

2018; Hilbig, Thielmann, Wührl, & Zettler, 2015; Zhao & Smillie, 2015) and reaches typical levels of self-other agreement of personality (e.g., Ashton & Lee, 2009). De Vries et al. (2014) expected IM to mainly overlap with the HEXACO dimensions of HH and (to a lesser degree) agreeableness and conscientiousness, which also are correlates of self-control (De Vries & Van

Gelder, 2013). This notion was supported by positive associations between self- and other-reports of both IM and the HEXACO dimensions.

Overall, IM and HH have been shown to be rather highly correlated around r = .50 (De

Vries et al., 2014; Dunlop et al., 2017; Zettler, Hilbig, Moshagen, & De Vries, 2015). In general, positive correlations still might be inflated due to the positively valenced item content of the IM

Page 13 of 69

and HH scales, but the associations with other-reports should rather alleviate this concern (De

Vries et al., 2014). Nonetheless, Zettler et al. (2015) sought to provide more convincing evidence of the suggested link between IM and trait honesty. Indeed, the authors found behavioral support for the notion that IM carries trait information of honesty as higher scores on IM were associated with more honest behavior in a cheating paradigm (see also Moshagen & Hilbig, 2017). Thus, based on the related accounts of IM scores reflecting honesty (de Vries et al., 2014; Zettler et al.,

2015) or self-control (Uziel, 2010, 2014), removing IM variance from self-reported characteristics or flagging these individuals as dishonest responders would be entirely misguided.

Considering the aforementioned lack of validation of lie scales and more recent opposing interpretations of scores on these measures, it might be necessary to overcome the apparent problems of using self-reported response scales as validity measures by using a different methodological approach. Such an approach has been proposed with the overclaiming technique.

3 Overclaiming

In a study on social desirability, Phillips and Clancy (1972) asked participants regarding their use of new products, books, television programs, and movies. However, none of the alleged new things did actually exist. The authors discovered that individuals who claimed familiarity with factually nonexistent items also presented themselves in a positive light regarding characteristics they deemed socially desirable (e.g., happiness, religiousness, marital satisfaction).

While it was not possible to validate the truthfulness of the self-reported characteristics, the claim to know nonexistent items was a clear act of overstating one’s knowledge which was termed

“overclaiming” accordingly. By providing an objective deviation from a known criterion that is associated with social value, the method appeared to be advantageous over the standard procedure of assessing socially desirable responding via simple self-report scales.

Page 14 of 69

Nevertheless, the method did not receive widespread attention until Paulhus, Harms,

Bruce, and Lysy (2003) introduced the overclaiming technique (OCT) as an elaborated measure by proposing the overclaiming effect as a general indicator of self-favoring response bias.

Crucially, the OCT relies on presenting existent terms (“targets” or “reals”) that are complemented by nonexistent terms (“lures” or “foils”). By using signal detection theory (Swets,

1964; Macmillan & Creelman, 2005), separate indices of response bias (i.e., the tendency to claim knowledge of both existent and nonexistent items) and accuracy (i.e., the ability to discriminate between targets and lures by claiming knowledge of the former and ignorance of the latter) can be determined. As the implementation of an overclaiming measure typically resembles a general knowledge test more than a self-report questionnaire, asking respondents whether they know or are familiar with a respective term from various knowledge domains, it has rather been conceived as a behavioral method (Paulhus & Holden, 2010). Importantly, controlling for overclaiming is supposed to control for motivated misrepresentation in self-reports (Bing,

Kluemper, Davison, Taylor, & Novicevic, 2011; Paulhus, 2011).

However, attempts to test the validity of the technique also resulted in a scattered picture.

On the one hand, overclaiming showed positive associations with impression management or narcissism (e.g., Bensch, Paulhus, Stankov, & Ziegler, 2019; Paulhus et al., 2003; Tracy, Cheng,

Robins, & Trzesniewski, 2009). On the other hand, there are various reports of failed attempts to observe such associations (e.g., Barber, Barnes, & Carlson, 2013; Dunlop et al., 2017; Mesmer-

Magnus, Viswesvaran, Deshpande, & Joseph, 2006). Likewise, the OCT functioned as a validity measure in some instances (Bing et al., 2011; Kemper & Menold, 2014), but was unrelated to misrepresentation concerning personality and cognitive ability (Ludeke & Makransky, 2016), not suitable to gauge item desirability of personality items (Kam, Risavy, & Perunovic, 2015), and

Page 15 of 69

only weakly associated with faking in a simulated job application context (Feeney & Goffin,

2015).

Similar to IM, the inconsistent evidence regarding overclaiming as a measure of deliberate self-presentation led to the emergence of other interpretations of the effect. While recommending the use of overclaiming over lie scales, Paulhus (2011) also noted that the effect includes a memory bias in the form of a feeling of knowing alongside a motivational component. In addition, overclaiming shows a consistent association with trait openness (e.g., Bing et al., 2011;

Tonković, Galić, & Jerneić, 2011; Ziegler, Kemper, & Rammstedt, 2013). Dunlop et al. (2017) proposed a new conception of overclaiming with respect to this association. In a series of studies, the authors present overclaiming to be the result of a dispositional tendency to be curious, explorative, and educated. This interpretation was supported by overclaiming being associated with self- and other-ratings of trait openness, years spent in formal education, self-reported knowledge, and, importantly, actual knowledge, whereas there was no support for other influences such as a memory bias or narcissism. The authors suspected that the accumulated knowledge triggers an illusory familiarity with lures in an overclaiming task, as these might resemble terms individuals factually know. In line with this cognitive exploration account, others also found overclaiming to be associated with self-reported knowledge (after controlling for actual knowledge), crystallized intelligence, and typical intellectual engagement (e.g., Atir,

Rosenzweig, & Dunning, 2015; Bensch et al., 2019; Hülür, Wilhelm, & Schipolowski, 2011).

In sum, neither lie scales nor the OCT have accumulated sufficient evidence to be considered as valid measures of self-favoring response biases. However, the general interest in such measures is justified considering the various contexts in which they could provide important information about a respondent and guide further steps. Given the lasting popularity of lie scales and the potential of the OCT as their purported methodologically advanced successor, the main

Page 16 of 69

objective of the present dissertation is to compare these two approaches in a validation framework to examine their applicability as validity measures. In addition, alternative interpretations of both IM and overclaiming deserve closer attention, particularly if the measures do not turn out to be valid indicators of self-favoring response biases. Therefore, three online studies were conducted which are summarized in the following subsections.

4 Aims of the Present Studies

In light of the questionable validation status of lie scales and the conflicting evidence regarding the OCT, the present studies were primarily focused on putting these competing approaches to a comprehensive test. As both measures are intended to detect the tendency to provide overly favorable self-reports, it must be demonstrated that they are indeed suitable for doing so. To test this, two different paradigms were employed which both offer a potentially biased self-report on the one hand and an arguably less biased information on the other hand. In both approaches, a valid measure of response distortion should thus account for overly positive self-presentation, that is, differences between self-reports and the respective other criterion.

Furthermore, recent alternative interpretations of IM suggest that the scale captures true virtues instead of or at least in addition to response bias, opposing the moniker of being a lie scale. To this end, two studies included some form of measuring honesty. Considering that only one study thus far provided behavioral results linking IM more convincingly to honest behavior than correlations with the HH scale, another major goal was to replicate this finding.

As for the OCT, a third study aimed to further explore potentially underlying mechanisms of the overclaiming effect, thereby examining the plausibility of alternative explanations of individuals claiming to be familiar with nonexistent terms. Addressing these issues, the present

Page 17 of 69

dissertation contributes to the ongoing search for a viable measure of self-serving response tendencies.

5 Studies

5.1 Study 1: Controlling for Response Bias in Self-Ratings of Personality – A

Comparison of Impression Management and Overclaiming

The first study served as the starting point for further considerations. The focus was on a broad validation approach towards IM and the OCT. Even though incorporating two relatively different variants of assessing the individual extent of self-favoring response bias, both high scores on a lie scale such as IM and on an overclaiming measure are supposed to reflect the tendency to provide overly favorable self-reports. To validate measures of response biases, it is therefore necessary to also validate the respective self-reports, ensuring that these indeed deviate from reality (e.g., Paulhus, 2002, 2011). Thus, the major criticism of lie scales that they assess actual traits instead of these deviations must be overcome.

It has been argued that the validity of a self-report can only be judged when it is validated via a comparison with external criteria (e.g., Colvin & Block, 1994; Cronbach & Meehl, 1955), resulting in so-called criterion discrepancy measures (e.g., Paulhus et al., 2003). Here, a critical step is the definition of the chosen criterion representing “reality”. In this vein, one possible strategy to validate self-reports is to obtain additional ratings of the targets from knowledgeable others (e.g., Colvin, Block, & Funder, 1995; John & Robins, 1994; Kurt & Paulhus, 2008; Kwan,

John, Kenny, Bond, & Robins, 2004; Leising, Locke, Kurzius, & Zimmermann, 2016).

Compared to lie scales, the other-report approach indeed seems to yield more successful results in exposing inaccurate self-reports, which is likely due to other-ratings having incremental

Page 18 of 69

predictive validity over self-ratings for a broad range of information about a target, for example, job performance (Connelly & Ones, 2010; Oh, Wang, & Mount, 2011), performance in intelligence and creativity tests (Vazire, 2010), prosocial behavior (Thielmann, Zimmermann,

Leising, & Hilbig, 2017), or personality disorders (Miller, Pilkonis, & Clifton, 2005).

Self- and other-reports of personality typically show positive correlations around r = .50

(Connelly & Ones, 2010; Connolly, Kavanagh, & Viswesvaran, 2007), revealing substantial self- other discrepancies. These self-other asymmetries (Vazire, 2010) imply that observers can provide insights into an individual’s personality that the self-report cannot, for instance, due to self-favoring response bias (Vazire & Carlson, 2011; Vazire & Mehl, 2008). Self-other agreement of personality increases with greater closeness of the relationship between self- and other-raters (Connelly & Ones, 2010; Connolly et al., 2007) and greater availability of relevant information when judging the target (Blackman & Funder, 1998; Funder, Kolar, & Blackman,

1995).

However, reaching consensus between raters appears to be more difficult for some personality dimensions than for others due to differing visibility and evaluativeness (Funder,

1995; John & Robins, 1993). For one, extraversion is highly visible as it is expressed externally

(e.g., sociable behavior). Honesty-humility, agreeableness, or the intellect component of openness, on the other hand, are highly evaluative in that social norms, values, and positive valence are associated with them (e.g., John & Robins, 1993; Lee & Ashton, 2004). Evaluative traits should be particularly prone to self-favoring response bias as respondents might try to overstate them (Connelly & Ones, 2010; Funder, 1995). Accordingly, self-other agreement decreases with increased evaluativeness and desirability of a domain (Asendorpf & Ostendorf,

1998; De Vries, Realo, & Allik, 2016; John & Robins, 1993; Vazire, 2010). In addition to its evaluativeness, openness is also rather low in visibility (Funder & Dobroth, 1987) which might

Page 19 of 69

explain why self-views of this trait explain the most variance across self- and other-reports

(McAbee & Connelly, 2016).

The major disadvantages of other-reports are, of course, that they are much more costly than simply administering a lie scale and also not readily available in most assessment contexts.

Hence, researchers attempted to validate lie scales by showing that they account for the existing self-other discrepancies, but measures such as IM or the Marlowe-Crowne Social Desirability

Scale (Crowne & Marlowe, 1960) were mostly unable to do so (e.g., Borkenau & Ostendorf,

1992; McCrae & Costa, 1983; Piedmont et al., 2000). McAbee and Connelly (2016) proposed the

Trait-Reputation-Identity (TRI) model that distinguishes shared variance between self- and other- reports (common Trait factor), unique variance of other-reports (specific Reputation factor), and unique variance of self-reports (specific Identity factor) via a bifactor model (e.g., Holzinger &

Swineford, 1937). In this framework, self-reported IM was unrelated to the Reputation factors for all Big Five dimensions, positively related to the Trait factors of agreeableness and conscientiousness, and negatively related to both the Trait and Identity factors of neuroticism.

Although McAbee and Connelly rather interpreted the associations with the Trait factors as results of successful self-presentation behavior, these can also be viewed such that the IM scale captured actual trait variance reflecting the personality profile associated with lie scales as presented by Uziel (2010), which, in turn, reflects correlates of honesty-humility (e.g., Ashton et al., 2014). Overclaiming, on the other hand, was associated with overly positive self-reports of extraversion, openness, and self-esteem when examined via deviations between self- and other- reports (Paulhus et al., 2003).

The first study provided a direct comparison of the respective capability of IM and the

OCT to account for self-other discrepancies. To this end, 81 dyads (N = 162) comprising well- acquainted individuals were recruited as these should be able to provide the most accurate other-

Page 20 of 69

reports (e.g., Connelly & Ones, 2010) and contribute unique insights (e.g., McAbee & Connelly,

2016; Vazire, 2010). Participants provided self-reports of the German versions of the HEXACO

(Ashton & Lee, 2009; Moshagen, Hilbig, & Zettler, 2014) and IM (Musch, Brockhaus, & Bröder,

2002; Paulhus, 1991), completed an overclaiming measure based on previously tested items

(Dunlop et al., 2017; Musch, Ostapczuk, & Klaiber, 2012), and also rated their dyad partner on the HEXACO. To the extent that IM and the OCT assess response distortions, it was expected that they account for self-other disagreement especially on the HEXACO dimensions that can be considered as socially desirable, that is, honesty-humility and agreeableness. Thus, if self-reports are overly favorable in some individuals, these were expected to be revealed by well-acquainted others.

In general, self-other agreement was typical (e.g., Connelly & Ones, 2010), being highest for extraversion and lowest for honesty-humility. Thus, there were also self-other discrepancies that could be explained by measures of self-favoring response biases. To test this, IM and overclaiming, respectively, were regressed on self- and other-reports separately for each personality dimension using an approach introduced by Humberg et al. (2018), while controlling for the dyadic structure of the data via multilevel modelling. For IM and overclaiming to be positively associated with self-other discrepancies on a HEXACO dimension in the expected direction, regression coefficients of self- and other-reports must be ≠ 0, with coefficients of self- reports > 0 and coefficients of other-reports < 0.

Albeit the data showed substantial variation in differences between self- and other-reports of all personality dimensions and also in IM and overclaiming, none of the two purported measures of response distortions accounted for self-other discrepancies. Of note, the varying characteristics of the relationships between dyad partners (e.g., relationship type, contact frequency, relationship length, or perceived closeness) did not act as moderators. As an additional

Page 21 of 69

test for self-presentation effects, it was examined whether IM or overclaiming suppressed the correlations between self- and other-reports of the HEXACO dimensions. Therefore, residuals of self-reports after regressing these on IM and overclaiming, respectively, were regressed on other- reports. Comparing the resulting slopes with those of regressing uncontrolled self-reports on other-reports, however, revealed no suppressor effects. The same applied when other-reports were regressed on residualized and uncontrolled self-reports, respectively. These results corroborated those from previous validation attempts involving a self-other framework regarding lie scales (e.g., Borkenau & Ostendorf, 1992; Piedmont et al., 2000) but opposed the findings reported concerning overclaiming (Paulhus et al., 2003). Notably, overclaiming and IM correlated negatively. Supporting the notion that IM assesses true virtues rather than bias (e.g.,

Zettler et al., 2015), IM was positively associated with both self- and other-reports of honesty- humility. Accordingly, controlling for IM in self-reports of HH reduced the strength of the relationship between self- and other-reports of HH. Overclaiming was associated with self-reports of openness but not with other-reports. However, as the test for overly favorable self-reports did not reach significance in the employed analysis, this association was rather in line with the proposed cognitive exploration account in that open individuals show stronger overclaiming

(Dunlop et al., 2017) instead of individuals high in overclaiming pretending to be open.

However, the approach of using deviations between self- and other-ratings of personality as a criterion discrepancy indicator entailed some limitations. To begin with, there are various other factors influencing self-other agreement apart from distorted self-reports such as differing informational foundations (e.g., Funder, 1995). Most importantly, the study relied on rather well- acquainted individuals as other-raters. Whereas these appear to be the most accurate judges of others’ personality in general (e.g., Connelly & Ones, 2010), they might also show a positivity bias influenced by their liking for the target (e.g., Wessels, Zimmermann, Biesanz, & Leising,

Page 22 of 69

2018; Leising, Ostrovski, & Zimmermann, 2013). The latter effect might be particularly pronounced when targets nominate their other-rater themselves as in the present study (Leising,

Erbs, & Fritz, 2010). However, positivity biases in other-reports appear to be more generally positive (e.g., Leising, Gallrein, & Dufner, 2014; Zimmermann, Schindler, Klaus, & Leising,

2018) and might thus have rather negligible effects on rank orders of traits between targets. It is also possible that individuals providing overly favorable self-reports are successful impression managers towards others in day-to-day life and thereby obtain overlap between self- and other- perspectives (e.g., Danay & Ziegler, 2011; McAbee & Connelly, 2016; Nezlek, Schütz, & Sellin,

2007), even though close others might be able to see through such self-presentation. Thus, it could be criticized that the employed self-other paradigm was not an optimal testbed to detect self-favoring bias.

5.2 Study 2: True Virtue, Self-Presentation, or Both? – A Behavioral Test of

Impression Management and Overclaiming

Apart from the gained insight that neither IM nor overclaiming accounted for self-other disagreement, a particularly interesting finding of the first study were the corresponding associations between IM and both self- and other-rated HH. In conjunction with previous reports

(e.g., De Vries et al., 2014; Zettler et al., 2015), Study 1 lent credence to the controversial stance that high scores on lie scales could reflect the very opposite of lying, that is, being honest.

However, other-reports may be biased as well (e.g., Leising et al., 2010) and the study by Zettler et al. (2015) was the only report of a direct association between IM and honest behavior. On the contrary, both Study 1 and findings by Dunlop et al. (2017) suggest overclaiming to be independent of trait honesty. From a behavioral standpoint, however, claiming to know nonexistent terms might also be seen as dishonest behavior, given that the degree of knowledge

Page 23 of 69

reported is objectively overstated. Hence, one goal of the second study was to compare IM and the OCT in a behavioral paradigm to examine whether they are related to honest or dishonest behavior.

Beyond that, Study 2 provided another setting in which IM and the OCT could account for self-favoring response bias. Regardless of the possible confound with trait variance of personality dimensions such as honesty-humility, it is still possible for IM to also be sensitive to overly positive self-reports (e.g., Lönnqvist, Paunonen, Tuulio-Henriksson, Lönnqvist, &

Verkasalo, 2007). Therefore, this study directly tested the competing interpretations of IM as a measure of bias or trait variance of honesty against each other. The same was tested for the HH scale. Even though HH is associated with trait-compatible behavior (e.g., Heck et al., 2018;

Thielmann & Hilbig, 2018), it is conceivable that the measure is also sensitive to misrepresentation considering that its items contain highly (un)desirable behaviors akin to lie scales. While this issue was discussed by the authors of the HEXACO (e.g., Ashton et al., 2014;

Lee & Ashton, 2004), it was rather dismissed in light of factors such as self-other agreement.

However, Study 1 showed the latter to be lowest for HH. In addition, applicants differ from nonapplicants especially regarding their scores on HH (Anglim, Morse, De Vries, MacCann, &

Marty, 2017). Lastly, overclaiming might still be able to detect response distortion in a narrower context compared to broad self-reports of personality. To test these differing accounts, a sample

(N = 461) roughly representative of the German population was recruited.

Study 1 relied on other-reports as a reality criterion to validate self-reports, but both such ratings are of subjective nature. Hence, Study 2 was focused on more objective behavioral measures. To assess honest behavior, the same cheating task which was used by Zettler et al.

(2015) as an adaptation of the die-under-cup paradigm (e.g., Fischbacher & Heusi, 2008; Hilbig

& Hessler, 2013; Shalvi, Handgraaf, & De Dreu, 2011) was employed. In this task, individuals

Page 24 of 69

were given the opportunity to cheat for their personal financial gain. Therefore, participants were asked to take a conventional coin, choose one side as their target (i.e., heads or tails), then toss the coin twice, and indicate if they tossed their target side both times which made them eligible for a prize. Similar to RRT questioning, individuals responded completely anonymously, since individuals who actually won provided the same response as individuals who cheated.

Importantly, the probability distribution of the randomization device (i.e., the stochastic probability of winning, in this case 25% for winning both coin tosses) was known and held constant (Moshagen, Hilbig, Erdfelder, & Moritz, 2014). Thereby, the proportion of dishonest individuals could be estimated by correcting for the disturbance due to honest winners

(Moshagen & Hilbig, 2017) and analyses tailored to randomized response variables were applied

(Heck & Moshagen, 2018).

To measure self-favoring response bias, a criterion discrepancy measure involving self- reported behavior and actual behavior was developed. Therefore, a standard variant of the dictator game (DG; e.g., Cherry, Frykblom, & Shogren, 2002; Forsythe, Horowitz, Savin, &

Sefton, 1994) was used. Importantly, behavior in a DG can be interpreted as a measure of selfishness versus fairness/cooperation (e.g., Hilbig et al., 2015; Thielmann & Hilbig, 2018). In addition, extreme allocation decisions in a DG likely entail social value, since being selfish and greedy should be considered undesirable whereas being prosocial and cooperative should be seen as desirable. In the game, participants were asked to divide an amount of money between themselves and another participant. To obtain a self-report and a behavioral measure within the same paradigm, participants first played a hypothetical version of the DG in which they should imagine how they would behave were they in the specific situation. Six weeks later, they played an incentivized version of the same scenario which now determined their individual payoff. Thus, the individual extent of self-favoring response bias was operationalized as the difference between

Page 25 of 69

the amount claimed to retain in the hypothetical scenario and the actively retained amount of money in the incentivized scenario, or, put differently, the difference between self-reported and exhibited selfish/prosocial behavior.

Based on the stochastic probability of winning, cheating occurred and DG discrepancies varied. However, overclaiming neither showed associations with (dis)honest behavior, self- presentation in the hypothetical DG, and self-reported honesty-humility nor, as opposed to Study

1, with IM. IM was again substantially associated with self-reported honesty which was corroborated by a (small) negative correlation with the probability to cheat. This time, however,

IM was also associated with self-presentation as it correlated negatively with the amount claimed to retain in the hypothesized scenario but was independent of the actually retained amount in the incentivized scenario. Hence, IM was related to greater discrepancies between self-report and behavior. As should be expected, HH was associated with honest behavior in the cheating task.

However, it was also related to overly favorable self-presentation in the hypothetical DG. Even though both allocating decisions in the hypothetical and actual DG were negatively associated with HH, such that individuals with higher scores on HH still tended to take less money for themselves, they nonetheless took more for themselves than initially claimed. Thus, IM and HH showed substantial overlap and a resembling pattern of associations with the indicators of honest behavior and response bias. To examine them jointly, cheating and the DG variables, respectively, were regressed on both IM and HH. Results suggested that, when controlling for the respective other, only IM was associated with self-favoring response bias. In addition, controlling for HH eliminated the negative relation between cheating and IM.

Thus, results regarding IM revealed an ambiguity of interpretations. On the one hand, a virtuousness account was again supported, as the positive correlation with HH this time was corroborated by a replication of the negative association with dishonest behavior as reported by

Page 26 of 69

Zettler et al. (2015). In contrast to Study 1, however, IM also accounted for discrepancies between self-reported and actual behavior in that individuals high in IM tended to take more money for themselves than they had claimed to do beforehand. This notwithstanding, the circumstance that high scores of IM in fact reflected two differing interpretations – with one of these being that the respective individuals tend to be truly honest – rather disqualified the utility of using IM as a means to detect and control for overly positive self-reports. Such an approach apparently introduces many cases of false positives and eliminates variance of trait honesty instead of bias. Interestingly though, IM and HH seemed to be able to correct each other. The HH scale acted as a validity measure in correcting the IM scale for its honesty variance component in that the negative correlation between IM and cheating vanished. Vice versa, the self-presentation effect on the criterion discrepancy measure associated with HH disappeared when controlling HH for IM. Nonetheless, some degree of response bias apparently also affected the HH scale. As in

Study 1, results concerning overclaiming were disappointing since the OCT again failed to show any validity as a general measure of self-favoring response bias. As a consequence, a third study was conducted to shed light on what might influence overclaiming if not motivated misrepresentation.

5.3 Study 3: Overclaiming Shares Processes With the Hindsight Bias

In both Studies 1 and 2, there was variation in the extent of overclaiming so that the effect itself occurred in a proportion of individuals. However, the validity of overclaiming as reflecting a measure of response bias was questioned by its apparent incapability of accounting for overly positive self-reports. Accordingly, it is necessary to examine other possible explanations of the consistent effect that some individuals tend to claim knowledge of made-up words while others do not. Dunlop et al. (2017) illustrated how the effect might be a result of dispositional openness,

Page 27 of 69

which was rather supported by an association with self-views of being open in Study 1. The circumstance that this relation was not found for other-reports might have been due to the rather low observability of openness (e.g., Funder & Dobroth, 1987). Furthermore, Paulhus (2011) reported that individuals high in overclaiming also showed a global memory bias, which could lead them to experience a feeling of knowing regarding the overclaiming items. This study followed up on the latter possibility that overclaiming involves a cognitive bias.

As also noted by Dunlop et al. (2017), thinking that one knows a nonexistent term in an overclaiming task can also be viewed as the result of an illusion of familiarity instead of self- favoring response bias. This illusion could be based on the circumstance that the nonexistent items of overclaiming measures are deliberately designed to appear plausible and resemble the existing words in which they are embedded. As a result, lures might be fluently processed

(Dunlop et al., 2017; Paulhus, 2011). Fluent processing of a meaningless term could, in turn, lead to cognitive dissonance which is resolved by misattributing the fluency to actual familiarity with the term (Whittlesea & Williams, 2001). Perceptual fluency reflects the ease, speed, and accuracy with which a stimulus is processed and has been found to influence recognition and information processing in various contexts (e.g., Alter & Oppenheimer, 2009; Begg, Anas, & Farinacci, 1992;

Bernstein & Harley, 2007; Briñol, Petty, & Tormala, 2006; Jacoby & Dallas, 1981; Schwarz et al., 1991). Dunlop et al. (2017) put the notion of a memory bias being involved in overclaiming to a test, but the latter was found to be unrelated to the general tendency to perceive novel stimuli as familiar. However, the authors relied on a memory task that only contained nonexistent terms, for which participants had to decide whether they were exposed to them in a previous round of the task (Whittlesea & Williams, 2000). Whereas this approach might measure an illusion of familiarity with unknown stimuli, it differs from the processes involved in an overclaiming task where new stimuli (i.e., the nonexistent terms) are surrounded by a majority of potentially

Page 28 of 69

familiar existent terms. Thus, the assumptions of Paulhus (2011) and Dunlop et al. (2017) could still hold in that overclaiming is, at least partly, a result of a cognitive bias in the form of an illusory feeling of familiarity.

To further explore the potential involvement of a cognitive bias, overclaiming was tested in conjunction with an established effect possibly being attributable to similar processes, namely, the hindsight bias or the “knew-it-all-along effect” (e.g., Fischhoff, 1975; Fischhoff & Beyth,

1975). One common approach to measure this bias is to let individuals estimate the answers to knowledge questions, then present the correct answers, and then ask for recollections of their original estimates. This typically results in individuals distorting their knowledge by giving false recollections that are closer to the correct answers (e.g., Hawkins & Hastie, 1990; Pohl & Hell,

1996; Schwarz & Stahlberg, 2003). Importantly, this so-called memory design (Pohl, 2007) seems to be independent of motivational influences (e.g., Christensen-Szalanski & Willham,

1991; Musch, 2003; Pohl & Hell, 1996). Rather, hindsight bias in memory designs is believed to be a consequence of biased recollection and biased reconstruction (e.g., Dehn & Erdfelder, 1998;

Erdfelder, Brandt, & Bröder, 2007; Erdfelder & Buchner, 1998; Schwarz & Stahlberg, 2003).

Recollection biases occur when the correct answers that are presented distort the memory of the originally given answers (e.g., Fischhoff, 1975) or inhibit their accessibility (e.g., Hell,

Gigerenzer, Gauggel, Mall, & Müller, 1988). Reconstruction biases occur when the correct answers alter the judgment processes that are involved in reconstructing the originally given answer upon failure to retrieve it from memory. Thereby, one’s uncertain knowledge is updated due to the given information (e.g., Hoffrage, Hertwig, & Gigerenzer, 2000), so that the recalled estimate is biased towards this new informational basis.

What makes biased reconstruction particularly relevant is the interpretation that its underlying processes are thought to involve a conscious sense-making process of antecedent and

Page 29 of 69

reported outcome and a misattribution of fluent processing (e.g., Ash, 2009; Roese & Vohs,

2012; Werth & Strack, 2003). Sense-making means that individuals, when confronted with a new piece of information (here, the correct answers), automatically attempt to integrate this information into their memory and try to make sense of it with the goal to create a causal explanation consistent with the new information. Similarly, in an overclaiming task under standard instructions, individuals do not expect to encounter nonexistent terms and any attempt to retrieve lures from memory will fail. Thus, they might also engage in sense-making to prevent dissonance, rendering the meaningless terms plausible (given the higher accessibility of information that is consistent with the term).

To test if overclaiming and hindsight bias share underlying processes, participants (N =

157) completed an overclaiming task and a hindsight task by Pohl, Bayen, and Martin (2010), respectively. The latter task relied on the memory design in that participants gave numeric estimates to almanac-type questions in a first stage. In a second stage, they were asked to recollect their initial estimates while the correct answers were presented for half the items. If recollected estimates are closer to the correct answers compared to the items for which no correct answer was presented, hindsight bias occurred. Importantly, by using the HB13 model (Erdfelder

& Buchner, 1998) to analyze the hindsight task, it was possible to distinguish the underlying psychological processes of biased recollection and biased reconstruction. The HB13 is a multinomial processing tree model (MPT) that can be applied to behavioral categorical data in order to explain observed category frequencies by a sequence of latent states which are interpreted as psychological processes (e.g., Batchelder & Riefer, 1999; Erdfelder et al., 2009;

Riefer & Batchelder, 1988). In the HB13 model, biased recollection is defined as a decreasing probability of a correct recall of one’s initial estimate for those items that have been presented with the correct answer. Given a failure to retrieve the initial estimate, reconstruction may be

Page 30 of 69

biased by the correct answer with a certain probability, which represents the respective amount of reconstruction bias.

Consistent with previous research (Erdfelder et al., 2007), hindsight bias occurred due to biased reconstruction but not due to biased recollection. In accordance with a cognitive bias account of overclaiming, individuals with a greater tendency to overclaim also exhibited a higher probability of biased reconstruction in the hindsight task as evident in a positive correlation. Of note, albeit rather small, the correlation was of typical magnitude when cognitive tasks are involved, which rarely exceed .30 and usually average around .20 due to comparatively low reliability (e.g., Christensen-Szalanski & Willham, 1991; Musch, 2003; Stanovich & West, 1998;

Teovanović, Knežević, & Stankov, 2015). Based on this finding, high scores on an overclaiming measure cannot unambiguously be interpreted as reflecting self-favoring response bias. Instead, these individuals could have genuinely thought that they had already heard of the lures because they experienced elevated processing fluency, automatically integrated the terms into their knowledge, and tried to make sense of them in order to relieve cognitive dissonance.

6 General Discussion

Psychological assessment based on self-reports faces the issue of potentially distorted responses due to overly favorable self-presentation. In reaction to this threat to the validity of the measurement and all consequences based on the assessment, a variety of measures has been proposed for the detection and control of self-favoring response bias in self-reports.

Unfortunately, there is an apparent lack of evidence as to whether these measures that are intended to enhance the validity of self-reports are valid themselves. The present dissertation addressed the question if the two different approaches of lie scales such as IM (Paulhus, 1991) and the OCT (Paulhus et al., 2003), respectively, are suitable tools to assess self-serving response

Page 31 of 69

tendencies. With respect to the apparent divergence between intended purpose and actual performance of both lie scales and the OCT in previous validation studies (e.g., Borkenau &

Ostendorf, 1992; Ludeke & Makransky, 2016) and the existence of alternative interpretations of high scores on both measures (e.g., Dunlop et al., 2017; Zettler et al., 2015), it was attempted to also put these notions to a test.

6.1 Impression Management, or How to Assess Virtuousness and Response Bias

Using a Single Scale

The lie scale approach is of long-standing history and is based on the premise that individuals endorsing (too) many items representing virtuous attributes do so due to a motivation to present themselves in a favorable light and in accordance with social norms (e.g., Crowne &

Marlowe, 1960). Whereas measures incorporating this reasoning should indeed be sensitive to overly positive self-presentations, they suffer from two major flaws. First, carriers of true virtues will also assign themselves higher values on these scales when responding honestly. Second, it is not possible to objectively determine criterion discrepancies, that is, whether individuals with high scores merely claim these attributes or whether they truly possess them. As a measure of response distortion, the IM scale produced inconsistent results. In Study 1, it failed to account for self-other discrepancies on the HEXACO personality inventory. In Study 2, it accounted for discrepancies between self-reported and actual behavior in a DG in that individuals with higher scores on IM acted more selfishly than initially claimed. More importantly, however, Studies 1 and 2 together also suggested a confound with trait variance of honesty which is in line with an account of IM reflecting true virtues (De Vries et al., 2014; Zettler et al., 2015). This confound was indicated by sources arguably more informative than mere positive correlations between self- reported IM and HH scales. Namely, individuals high in IM (and HH) were also judged to be

Page 32 of 69

honest and humble by knowledgeable others (Study 1) and behaved more honestly when provided with an opportunity to cheat (Study 2). Thus, at best, IM seems to measure both true virtues and response bias.

The circumstance that lie scales are prone to a confound with personality traits has been acknowledged before (e.g., Paulhus, 2002) and highlighted by extensive reviews (Uziel, 2010) or meta-analytic findings (e.g., Connelly & Chang, 2016). In general, this confound means that controlling for a lie scale in another variable also removes relevant variance of personality traits.

However, the apparent contamination with trait honesty completely negates the interpretability of

IM scores. Paulhus and John (1998) attempted to reframe IM from a general indicator of self- presentation to a moralistic bias, confined to overly positive self-presentation with respect to moral and dutiful characteristics. This already highlighted the predominant content of the scale and hinted at the confound with honesty-humility. However, individuals high in IM are, essentially, assumed to be claiming uncommon virtues and discarding common faults, that is, being dishonest, which is the direct opposite of the interpretation suggested by the association with self- and other-reports of honesty-humility and honest behavior in a cheating paradigm.

Yet, the overlap between IM and the HH scale does not seem to be attributable solely to shared honesty variance. Although previous research has established that honesty-humility as measured in the HEXACO can be regarded as valid in predicting trait-compatible behaviors (e.g.,

Ashton et al., 2014; Heck et al., 2018; Thielmann & Hilbig, 2018), Study 2 indicated a small association with self-presentation for the HH scale as well. Due to their similarity, both IM and

HH appear to contain variance of honesty and bias. It should be emphasized, however, that both behaviors in the hypothetical and incentivized DG were negatively associated with HH despite some extent of bias seemingly affecting the initial self-report. Thus, HH scores were still interpretable as reflecting individual dispositions in honesty and humility.

Page 33 of 69

In contrast, high scores on IM are rendered ambiguous in light of the discrepant findings in Studies 1 and 2. Apparently, individuals can differ substantially in how they answer IM items.

Highly honest individuals might adequately display their virtues and receive high scores. On the other hand, “real” impression managers will also receive high scores by claiming these virtues. In addition, there will be multiple combinations of varying extents of bias and honesty. As a result, individuals with high scores on the IM scale cannot necessarily be judged as presenting themselves overly favorably, but individuals doing so still appear to score highly on the scale as well, thereby also disqualifying IM as a valid measure of true virtues. Hence, flagging a self- report as invalid or not inviting an applicant to a job interview because of their high score on the

IM scale has a certain potential for punishing truly virtuous individuals (and therefore, e.g., particularly suitable applicants) for honestly reporting their personality.

The present studies solely relied on IM as a representative of the lie scale approach due to its popularity and the circumstance that other similar instruments such as the Marlowe-Crowne scale (Crowne & Marlowe, 1960) and its German version (Lück & Timaeus, 1969) can be considered as rather outdated regarding their wording. It remains to be seen if the contamination with honesty variance as has been repeatedly found concerning IM also applies to other lie scales.

This can be expected, however, since these scales adhere to the same underlying unlikely virtues principle. Still, it should be attempted to replicate the present findings with other operationalizations of dishonest behavior and self-presentation, while also directly comparing different lie scales with each other to maximize comparability and reach of possible conclusions.

After all, one could argue that there is still potential for a slight hope with regard to the validity of IM as a measure of response bias. In spite of being largely contaminated with trait variance of honesty, Study 2 showed that this confound could be resolved by controlling IM for

HH. Controlling for HH resulted in a seemingly valid residual of the IM scale which was

Page 34 of 69

correlated with a higher propensity to cheat in addition to the still present association with overly favorable self-presentation in the hypothetical DG scenario. Consequently, future research should also examine whether the finding holds true that a valid IM measure might be achievable by simply including HH in the measurement. In that case, it might even be possible to construct an improved IM measure that is free of or at least less confounded with variance of trait honesty

(i.e., less prone to produce false positives).

6.2 Overclaiming, or How to Assess Some Form of Openness and Biased

Cognition

As opposed to IM which at least partly appears to be another self-report measure of personality traits, overclaiming is intended to offer a “face valid index of faking” (Paulhus, 2011, p. 153), since individuals claiming to be familiar with bogus terms are considered as objectively overstating a socially desirable attribute, that is, their general knowledge. Thanks to its measurement technique relying on existent terms from various knowledge domains in conjunction with nonexistent terms, an overclaiming measure itself comprises a form of criterion discrepancy. That is why the results of the first two studies were all the more disappointing, with overclaiming failing to account for self-favoring response bias regarding both operationalizations of criterion discrepancies as indicators of self-presentation. Instead, results of Study 3 indicated the involvement of a cognitive bias akin to the hindsight bias that would lead some individuals to genuinely think they had already heard of a bogus term. Even though this study only provides initial evidence and is therefore also in need of replication, Paulhus (2011) noted the possibility of a cognitive bias as part of the effect as well. The problem is, however, ultimately the same that pertains to IM. Given that biased cognitive processes are accountable for high scores on an

Page 35 of 69

overclaiming measure, even if only in part, these scores cannot unambiguously be interpreted let alone be attributed to motivated misrepresentation.

Based on these three studies, the focus of future research should be on further examining contributing factors and underlying processes of the overclaiming effect rather than its suitability as a measure of response distortion. One contributing factor might be the misattribution of processing fluency to familiarity, which would be caused by the nonexistent terms being too believable for some individuals due to their resemblance to the existent items. This effect could thus be increased in individuals who have previously been exposed to more stimuli resembling the lures of an overclaiming task, that is, those who are more intellectually engaged and more knowledgeable. In other words, the involvement of a cognitive bias is fully compatible with an account of overclaiming as the consequence of dispositional cognitive exploration suggested by the consistent association with openness (Dunlop et al., 2017), which, in turn, exhibits a moderate relationship with general intelligence (e.g., Ackerman & Heggestad, 1997; DeYoung, 2011).

Overall, the link between overclaiming and indicators of cognitive ability has shown to be rather limited (Dunlop et al., 2017; Hülür et al., 2011; Ludeke & Makransky, 2016; Paulhus & Harms,

2004), but associations with crystallized intelligence/actual knowledge seem to be more pronounced (Bensch et al., 2019; Dunlop et al., 2017).

Furthermore, openness is associated with (over)confidence in one’s knowledge and performance in cognitive tasks (e.g., Pallier et al., 2002; Kleitman & Stankov, 2007; Schaefer,

Williams, Goodie, & Campbell, 2004), and friends’ ratings of one’s creativity and intelligence also appear to be more accurate compared to self-ratings (Vazire, 2010). Thus, individuals that are more open and intellectual might also be more confident in judging their knowledge on an overclaiming measure. Overclaiming itself is also positively related to overconfidence (e.g.,

Bensch et al., 2019) and has even been operationalized as a measure of the latter (e.g., Anderson,

Page 36 of 69

Brion, Moore, & Kennedy, 2012; Murphy et al., 2015). This notion fits reports that intellectual humility, which is conceptualized as the awareness of one’s intellectual fallibility (Krumrei-

Mancuso & Rouse, 2016), is negatively related to overclaiming (Alfano et al., 2017; Krumrei-

Mancuso, Haggard, LaBouff, & Rowatt, 2019; but see Deffler, Leary, & Hoyle, 2016). In a related manner, recent findings indicate that stronger overclaiming loads on a common factor with higher receptivity, which is the tendency to ascribe deeper meaning to pseudo- profound but randomly generated sentences (e.g., “Wholeness quiets infinite phenomena”;

Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2015), a higher propensity to fall for , and less analytic thinking (Pennycook & Rand, 2019). Remarkably, the authors attributed these associations to fluent processing and to a tendency to be overly open to ideas.

Thus, the possibility that individuals high in overclaiming are dispositionally open, curious, and intellectual should be further explored. Specifically, associations with self-reported indicators of cognitive exploration should be corroborated by more diverse performance measures of possible outcomes of this disposition. Aside from actual knowledge, particularly interesting outcomes of trait cognitive exploration are creativity and divergent thinking (Carson,

Peterson, & Higgins, 2005; Feist, 1998; McCrae, 1987; Silvia, Nusbaum, Berg, Martin, &

O’Connor, 2009). Due to their creative nature, these individuals should be especially prone to sense-making processes with respect to the lures of an overclaiming task, in that they easily (i.e., fluently) come up with a factually false but plausible explanation of the bogus terms and where they might have encountered them before. Furthermore, if the misattribution of fluent processing is indeed an underlying cognitive process, manipulating the processing fluency should affect the magnitude of the overclaiming effect. For example, the effect should be reduced when the visual clarity of the items is limited, thereby slowing down and deepening the necessary processing

Page 37 of 69

(e.g., Alter, Oppenheimer, Epley, & Eyre, 2007; Reber & Schwarz, 1999; Reber & Zupanek,

2002).

Still, the present findings do not rule out completely that motivational misrepresentation contributes to the overclaiming effect as well. A possible explanation why overclaiming failed to show validity as a measure of self-presentation in many validation approaches is that the effect and its measures are more context-dependent (Paulhus, 2011) than initially conceived (Paulhus et al., 2003). Items of overclaiming measures such as the one used in the present studies typically represent general knowledge domains (e.g., Hülür et al., 2011; Paulhus et al., 2003; Ziegler et al.,

2013). Thus, the motivation to present oneself overly positively might be limited to appearing

(more) intellectual and knowledgeable than one actually is. Importantly, this notion of a more specific intellectual overclaiming would not preclude these individuals to be factually high in openness. Quite the contrary, individuals that are open, curious, and more intellectually engaged than others likely also identify themselves as more intellectual and knowledgeable than others.

Hence, they might have a certain motivation to feel that their abilities and personality are adequately presented in an assessment, resulting in overstating their knowledge.

This refined interpretation of motivated overclaiming would rather limit the self-favoring effect to a form of intellectual narcissism. Interestingly, despite generally inconsistent findings regarding the relationship between overclaiming and narcissism (Bensch et al., 2019; Dunlop et al., 2017; Gebauer, Sedikides, Verplanken, & Maio, 2012; Ludeke & Makransky, 2016; Paulhus et al., 2003; Paulhus & Harms, 2004), Grosz, Lösch, and Back (2017) suggested a more nuanced association as overclaiming was related to narcissism specific to intellectual ability (and social dominance). This might explain why the OCT showed some validity in higher-stakes contexts such as the Programme for International Student Assessment 2012 (PISA 2012; Organisation for

Economic Co-operation and Development, 2014), in which overclaiming suppressed the

Page 38 of 69

association between familiarity with math concepts and math test scores (Vonkova, Papajoanu, &

Stipek, 2018), or a simulated application context, in which it suppressed the association between

Grade Point Average (GPA) and achievement-striving (Bing et al., 2011).

6.3 A Status Update, or How to (Not) Deal With Self-Favoring Response Bias

In general, it must be noted that all three studies are subject to a common limitation, that is, the exclusive use of low-demand situations in which individuals might not be particularly motivated to present themselves overly favorably. IM is supposed to reflect impressing others

(Paulhus, 1984) but should also occur without a target audience (Paulhus, 2002). Overclaiming, on the other hand, might only reflect self-presentation when an audience is salient (Paulhus,

2011). Thus, one could be tempted to limit the applicability of the present results to low-demand situations. However, the context in which these results were obtained does not mitigate the confound of IM with trait honesty. The proportion of individuals high in IM due to self-favoring response bias or due to honest reporting of true virtues may be different in a high-demand setting, as it can be expected that there is more variance in response distortion in the latter setting.

Nonetheless, dishonest responders will still be mixed with honest responders who will also obtain high scores. At the utmost, the IM scale in its present form might thus be of some (albeit limited) utility when used as a screening measure due to its sensitivity to self-presentation. For instance, an applicant with a high score on IM might be examined more thoroughly in a subsequent selection interview. In such applications, though, the expectable low specificity of IM scores must be taken into account. The same problem pertains to overclaiming, as biased cognitive processing will likely also occur in a high-stakes setting and potentially affect the scores on an overclaiming measure.

Page 39 of 69

For now, the apparent invalidity of existing (alleged) validity measures ultimately leaves the question of how to validly and conveniently detect self-favoring response bias rather unanswered. In the worst-case scenario, it will not be feasible to improve IM by controlling for its honesty variance component via the HH scale, whereas overclaiming measures will be shown to rather belong to the nomological net of openness/intellect. Unfortunately, individuals with a self- serving response tendency cannot be expected to refrain from distorting their self-reports due to the lack of simple ways to detect them.

Criterion discrepancy measures such as the one used in Study 2 seem to be capable of exposing some individuals as providing favorable self-reports, such as claiming to be prosocial without behaving accordingly. However, criterion discrepancies are comparatively difficult to obtain. Basically, one can only be certain that individuals exhibit a “departure from reality”

(Paulhus, 2002, p. 49) if the relevant reality criterion is known. Whether individuals claiming to be extremely agreeable are distorting their self-reports is a tough call if they cannot be caught behaving to the contrary. Usually, however, one is not invited to the parties the self-reported extreme extravert may or may not be the center of attention of, so that it cannot be ruled out that the self-reported scores are a result of misrepresentation. A good friend of this individual should give a fairly realistic view of the target (e.g., Connelly & Ones, 2010) but likely show positivity bias as well (e.g., Wessels et al., 2018). In addition, other-reports should be particularly prone to social desirability when the other-raters know that their ratings are part of a job recruiting process or a legal examination for a child custody lawsuit.

It is possible that former applicants assign themselves less desirable self-reports of personality when retested outside of an application setting (e.g., Ellingson, Sackett, & Connelly,

2007; Griffith, Chmielowski, & Yoshita, 2007; Krammer, Sommer, & Arendasy, 2017), but then it is obviously too late to benefit from a more realistic self-report. Low levels of honesty-humility

Page 40 of 69

seem to be associated with distorting personality test scores in applicants (O’Neill et al., 2013) and with faking in job interviews (Bühl & Melchers, 2017; Roulin & Bourdage, 2017), but scores on the HH scale obtained in high-stakes testing are likely overly positive as well (Anglim,

Lievens, Everton, Grant, & Marty, 2018; Anglim et al., 2017). Indirect questioning techniques

(e.g., Moshagen et al., 2012; Yu et al., 2008) are a useful tool to obtain a more accurate prevalence of a sensitive attribute as they reduce social desirability via enhanced anonymity, but they incorporate no information about individuals’ extent of bias so that they cannot be used in contexts where response bias needs to be detected or controlled for.

One strategy to alleviate the potential of distorted self-reports might be to use personality measures that are designed to be less evaluative, thereby supposedly activating less motivation to respond in a socially desirable manner. For example, work by Bäckström and colleagues suggests that simply neutralizing the evaluativeness associated with items by rephrasing them reduces social desirability while retaining validity and internal consistency (Bäckström & Björklund,

2013; Bäckström, Björklund, & Larsson, 2009, 2014). Interestingly, this neutralization seems to reduce factor correlations and even eliminate the presence of a general factor of personality

(Bäckström & Björklund, 2016), supporting the idea that the latter contains overlap in bias (e.g.,

Danay & Ziegler, 2011; Davies, Connelly, Ones, & Birkland, 2015; MacCann, Pearce, & Jiang,

2017). However, neutralized measures are currently not widely available, only pertain to measuring the Big Five, and might also be more difficult to construct for traits that are highly evaluative such as honesty-humility. Thus, the search for an easily implementable solution to control for self-favoring response bias in the full variety of self-report assessments is to be continued.

Page 41 of 69

7 Conclusion

Taken together, the present dissertation compared two popular variations of assessing overly positive self-presentation in self-reports in a validation framework. Findings largely supported the skepticism that has been raised with regard to the use of lie scales. For IM as the prototype of this approach, it is well possible that the scale measures the opposite of what it is supposed to, that is, individual differences in true virtuousness rather than claiming unlikely virtues. Yet, IM also exhibited sensitivity to the latter, so that future work should build on the provided insights and examine whether the conflicting components within IM variance can be disentangled. Findings concerning overclaiming rather dashed hopes that the technique could live up to its potential as a more objective form of criterion discrepancy measure. In addition to absent validity in accounting for response distortion, alternative conceptualizations of the effect such as a cognitive bias and dispositional cognitive exploration should be focused on in upcoming research. In conclusion, based on the studies presented in this dissertation, referring to IM and the

OCT as highly valid measures of self-favoring response biases appears to be an act of overclaiming and an overly favorable representation.

Page 42 of 69

References

Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for

overlapping traits. Psychological Bulletin, 121, 219–245. https://doi.org/10.1037/0033-

2909.121.2.219

Alfano, M., Iurino, K., Stey, P., Robinson, B., Christen, M., Yu, F., & Lapsley, D. (2017).

Development and validation of a multi-dimensional measure of intellectual humility. PloS

ONE, 12, e0182950. https://doi.org/10.1371/journal.pone.0182950

Allport, G. W. (1928). A test for ascendance-submission. The Journal of Abnormal and Social

Psychology, 23, 118–136. https://doi.org/10.1037/h0074218

Alter, A. L., & Oppenheimer, D. M. (2009). Uniting the tribes of fluency to form a metacognitive

nation. Personality and Social Psychology Review, 13, 219–235.

https://doi.org/10.1177/1088868309341564

Alter, A. L., Oppenheimer, D. M., Epley, N., & Eyre, R. N. (2007). Overcoming intuition:

Metacognitive difficulty activates analytic reasoning. Journal of Experimental

Psychology: General, 136, 569–576. https://doi.org/10.1037/0096-3445.136.4.569

Anderson, C., Brion, S., Moore, D. A., & Kennedy, J. A. (2012). A status-enhancement account

of overconfidence. Journal of Personality and Social Psychology, 103, 718–735.

https://doi.org/10.1037/a0029395

Anglim, J., Lievens, F., Everton, L., Grant, S. L., & Marty, A. (2018). HEXACO personality

predicts counterproductive work behavior and organizational citizenship behavior in low-

stakes and job applicant contexts. Journal of Research in Personality, 77, 11–20.

https://doi.org/10.1016/j.jrp.2018.09.003

Anglim, J., Morse, G., De Vries, R. E., MacCann, C., & Marty, A. (2017). Comparing job

applicants to non-applicants using an item-level bifactor model on the HEXACO

Page 43 of 69

personality inventory. European Journal of Personality, 31, 669–684.

https://doi.org/10.1002/per.2120

Asendorpf, J. B., & Ostendorf, F. (1998). Is self-enhancement healthy? Conceptual,

psychometric, and empirical analysis. Journal of Personality and Social Psychology, 74,

955–966. https://doi.org/10.1037/0022-3514.74.4.955

Ash, I. K. (2009). Surprise, memory, and retrospective judgment making: Testing cognitive

reconstruction theories of the hindsight bias effect. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 35, 916–933. https://doi.org/10.1037/a0015504

Ashton, M. C., & Lee, K. (2008). The prediction of honesty-humility-related criteria by the

HEXACO and five-factor models of personality. Journal of Research in Personality, 42,

1216–1228. https://doi.org/10.1016/j.jrp.2008.03.006

Ashton, M. C., & Lee, K. (2009). The HEXACO-60: A short measure of the major dimensions of

personality. Journal of Personality Assessment, 91, 340–345.

https://doi.org/10.1080/00223890902935878

Ashton, M. C., Lee, K., & de Vries, R. E. (2014). The HEXACO honesty-humility,

agreeableness, and emotionality factors: A review of research and theory. Personality and

Social Psychology Review, 18, 139–152. https://doi.org/10.1177/1088868314523838

Atir, S., Rosenzweig, E., & Dunning, D. (2015). When knowledge knows no bounds: Self-

perceived expertise predicts claims of impossible knowledge. Psychological Science, 26,

1295–1303. https://doi.org/10.1177/0956797615588195

Bäckström, M., & Björklund, F. (2013). Social desirability in personality inventories: Symptoms,

diagnosis and prescribed cure. Scandinavian Journal of Psychology, 54, 152–159.

https://doi.org/10.1111/sjop.12015

Page 44 of 69

Bäckström, M., & Björklund, F. (2016). Is the general factor of personality based on evaluative

responding? Experimental manipulation of item-popularity in personality inventories.

Personality and Individual Differences, 96, 31–35.

https://doi.org/10.1016/j.paid.2016.02.058

Bäckström, M., Björklund, F., & Larsson, M. R. (2009). Five-factor inventories have a major

general factor related to social desirability which can be reduced by framing items

neutrally. Journal of Research in Personality, 43, 335–344.

https://doi.org/10.1016/j.jrp.2008.12.013

Bäckström, M., Björklund, F., & Larsson, M. R. (2014). Criterion validity is maintained when

items are evaluatively neutralized: Evidence from a full‐scale five‐factor model inventory.

European Journal of Personality, 28, 620–633. https://doi.org/10.1002/per.1960

Barber, L. K., Barnes, C. M., & Carlson, K. D. (2013). Random and systematic error effects of

insomnia on survey behavior. Organizational Research Methods, 16, 616–649.

https://doi.org/10.1177/1094428113493120

Batchelder, W. H., & Riefer, D. M. (1999). Theoretical and empirical review of multinomial

process tree modeling. Psychonomic Bulletin & Review, 6, 57–86.

https://doi.org/10.3758/BF03210812

Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of self-reports

and finger movements: Whatever happened to actual behavior? Perspectives on

Psychological Science, 2, 396–403. https://doi.org/10.1111/j.1745-6916.2007.00051.x

Begg, I. M., Anas, A., & Farinacci, S. (1992). Dissociation of processes in belief: Source

recollection, statement familiarity, and the illusion of truth. Journal of Experimental

Psychology: General, 121, 446–458. https://doi.org/10.1037/0096-3445.121.4.446

Page 45 of 69

Bensch, D., Paulhus, D. L., Stankov, L., & Ziegler, M. (2019). Teasing apart overclaiming,

overconfidence, and socially desirable responding. Assessment, 26, 351–363.

https://doi.org/10.1177/1073191117700268.

Bernreuter, R. G. (1933). Validity of the personality inventory. Personnel Journal, 11, 383–386.

Bernstein, D. M., & Harley, E. M. (2007). Fluency misattribution and visual hindsight bias.

Memory, 15, 548–560. https://doi.org/10.1080/09658210701390701

Bing, M. N., Kluemper, D., Davison, H. K., Taylor, S., & Novicevic, M. (2011). Overclaiming as

a measure of faking. Organizational Behavior and Human Decision Processes, 116, 148–

162. https://doi.org/10.1016/j.obhdp.2011.05.006

Blackman, M. C., & Funder, D. C. (1998). The effect of information on consensus and accuracy

in personality judgment. Journal of Experimental Social Psychology, 34, 164–181.

https://doi.org/10.1006/jesp.1997.1347

Block, J. (1965). The challenge of response sets: Unconfounding meaning, acquiescence, and

social desirability in the MMPI. East Norwalk, CT: Appleton-Century-Crofts.

Borkenau, P., & Ostendorf, F. (1992). Social desirability scales as moderator and suppressor

variables. European Journal of Personality, 6, 199–214.

https://doi.org/10.1002/per.2410060303

Borkenau, P., & Zaltauskas, K. (2009). Effects of self-enhancement on agreement on personality

profiles. European Journal of Personality, 23, 107–123. https://doi.org/10.1002/per.707

Briñol, P., Petty, R. E., & Tormala, Z. L. (2006). The malleable meaning of subjective ease.

Psychological Science, 17, 200–206. https://doi.org/10.1111/j.1467-9280.2006.01686.x

Bühl, A. K., & Melchers, K. G. (2017). Individual difference variables and the occurrence and

effectiveness of faking behavior in interviews. Frontiers in Psychology, 8, 686.

https://doi.org/10.3389/fpsyg.2017.00686

Page 46 of 69

Carson, S. H., Peterson, J. B., & Higgins, D. M. (2005). Reliability, validity, and factor structure

of the Creative Achievement Questionnaire. Creativity Research Journal, 17, 37–50.

https://doi.org/10.1207/s15326934crj1701_4

Cherry, T. L., Frykblom, P., & Shogren, J. F. (2002). Hardnose the dictator. American Economic

Review, 92, 1218–1221. https://doi.org/10.1257/00028280260344740

Christensen-Szalanski, J. J., & Willham, C. F. (1991). The hindsight bias: A meta-analysis.

Organizational Behavior and Human Decision Processes, 48, 147–168.

https://doi.org/10.1016/0749-5978(91)90010-Q

Christiansen, N. D., Goffin, R. D., Johnston, N. G., & Rothstein, M. G. (1994). Correcting the

16PF for faking: Effects on criterion‐related validity and individual hiring decisions.

Personnel Psychology, 47, 847–860. https://doi.org/10.1111/j.1744-6570.1994.tb01581.x

Colvin, C. R., & Block, J. (1994). Do positive illusions foster mental health? An examination of

the Taylor and Brown formulation. Psychological Bulletin, 116, 3–20.

https://doi.org/10.1037/0033-2909.116.1.3

Colvin, C. R., Block, J., & Funder, D. C. (1995). Overly positive self-evaluations and personality:

Negative implications for mental health. Journal of Personality and Social Psychology,

68, 1152–1162. https:// doi.org/10.1037//0022-3514.68.6.1152

Connelly, B. S., & Chang, L. (2016). A meta‐analytic multitrait multirater separation of substance

and style in social desirability scales. Journal of Personality, 84, 319–334.

https://doi.org/10.1111/jopy.12161

Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic

integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136,

1092–1122. https://doi.org/10.1037/a0021212

Page 47 of 69

Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The convergent validity between self

and observer ratings of personality: A meta-analytic review. International Journal of

Selection and Assessment, 15, 110–117. https://doi.org/10.1111/j.1468-

2389.2007.00371.x

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological

Bulletin, 52, 281–302. https://doi.org/10.1037/h0040957

Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of

psychopathology. Journal of Consulting Psychology, 24, 349–354.

Crowne, D. P., & Marlowe, D. (1964). The approval motive: Studies in evaluative dependence.

New York, NY: Wiley.

Danay, E., & Ziegler, M. (2011). Is there really a single factor of personality? A multirater

approach to the apex of personality. Journal of Research in Personality, 45, 560–567.

https://doi.org/10.1016/j.jrp.2011.07.003

Davies, S. E., Connelly, B. S., Ones, D. S., & Birkland, A. S. (2015). The general factor of

personality: The “Big One,” a self-evaluative trait, or a methodological gnat that won’t go

away? Personality and Individual Differences, 81, 13–22.

https://doi.org/10.1016/j.paid.2015.01.006

Deffler, S. A., Leary, M. R., & Hoyle, R. H. (2016). Knowing what you know: Intellectual

humility and judgments of recognition memory. Personality and Individual Differences,

96, 255–259. https://doi.org/10.1016/j.paid.2016.03.016

Dehn, D. M., & Erdfelder, E. (1998). What kind of bias is hindsight bias? Psychological

Research, 61, 135–146. https://doi.org/10.1007/s004260050020

Page 48 of 69 De Vries, R. E., Realo, A., & Allik, J. (2016). Using personality item characteristics to predict

single-item internal reliability, retest reliability, and self-other agreement. European

Journal of Personality, 30, 618–636. https://doi.org/10.1002/per.2083

De Vries, R. E., & van Gelder, J. L. (2013). Tales of two self-control scales: Relations with five-

factor and HEXACO traits. Personality and Individual Differences, 54, 756–760.

http://dx.doi.org/10.1016/j.paid.2012.12.023

De Vries, R. E., Zettler, I., & Hilbig, B. E. (2014). Rethinking trait conceptions of social

desirability scales: Impression management as an expression of honesty-humility.

Assessment, 21, 286–299. https://doi.org/10.1177/1073191113504619

DeYoung, C. G. (2011). Intelligence and personality. In R. J. Sternberg & S. B. Kaufman (Eds.),

Cambridge handbooks in psychology. The Cambridge handbook of intelligence (pp. 711–

737). New York, NY: Cambridge University Press.

https://doi.org/10.1017/CBO9780511977244.036

Donovan, J. J., Dwight, S. A., & Schneider, D. (2014). The impact of applicant faking on

selection measures, hiring decisions, and employee performance. Journal of Business and

Psychology, 29, 479–493. https://doi.org/10.1007/s10869-013-9318-5

Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., & Ludeke, S. G. (2017).

Openness to (reporting) experiences that one never had: Overclaiming as an outcome of

the knowledge accumulated through a proclivity for cognitive and aesthetic exploration.

Journal of Personality and Social Psychology, 113, 810–834.

https://doi.org/10.1037/pspp0000110

Edwards, A. L. (1957). The social desirability variable in personality assessment and research.

New York, NY: The Dryden Press.

Page 49 of 69

Edwards, A. L. (1990). Construct validity and social desirability. American Psychologist, 45,

287–289. https://doi.org/10.1037/0003-066X.45.2.287

Ellingson, J. E., Sackett, P. R., & Connelly, B. S. (2007). Personality assessment across selection

and development contexts: Insights into response distortion. Journal of Applied

Psychology, 92, 386–395. https://doi.org/10.1037/0021-9010.92.2.386

Erdfelder, E., Auer, T. S., Hilbig, B. E., Aßfalg, A., Moshagen, M., & Nadarevic, L. (2009).

Multinomial processing tree models: A review of the literature. Zeitschrift für

Psychologie/Journal of Psychology, 217, 108–124. https://doi.org/10.1027/0044-

3409.217.3.108

Erdfelder, E., Brandt, M., & Bröder, A. (2007). Recollection biases in hindsight judgments.

Social Cognition, 25, 114–131. https://doi.org/10.1521/soco.2007.25.1.114

Erdfelder, E., & Buchner, A. (1998). Decomposing the hindsight bias: A multinomial processing

tree model for separating recollection and reconstruction in hindsight. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 24, 387–414.

https://doi.org/10.1037/0278-7393.24.2.387

Eysenck, S. B., Eysenck, H. J., & Barrett, P. (1985). A revised version of the psychoticism scale.

Personality and Individual Differences, 6, 21–29. https://doi.org/10.1016/0191-

8869(85)90026-1

Feeney, J. R., & Goffin, R. D. (2015). The Overclaiming Questionnaire: A good way to measure

faking? Personality and Individual Differences, 82, 248–252.

https://doi.org/10.1016/j.paid.2015.03.038

Feist, G. J. (1998). A meta-analysis of personality in scientific and artistic creativity. Personality

and Social Psychology Review, 2, 290–309.

https://doi.org/10.1207/s15327957pspr0204_5

Page 50 of 69

Fischbacher, U., & Heusi, F. (2008). Lies in disguise, an experimental study on cheating.

Research Paper Series, Thurgau Institute of Economics and Department of Economics,

University of Konstanz. Retrieved from:

http://www.twikreuzlingen.ch/uploads/tx_cal/media/TWI-RPS-040-Fischbacher-Heusi-

2008-11.pdf.

Fischhoff, B. (1975). Hindsight is not equal to foresight: The effect of outcome knowledge on

judgment under uncertainty. Journal of Experimental Psychology: Human Perception and

Performance, 1, 288–299. https://doi.org/10.1037/0096-1523.1.3.288

Fischhoff, B., & Beyth, R. (1975). “I knew it would happen”: Remembered probabilities of once-

future things. Organizational Behavior & Human Performance, 13, 1–16.

https://doi.org/10.1016/0030-5073(75)90002-1

Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. (1994). Fairness in simple bargaining

experiments. Games and Economic Behavior, 6, 347–369.

https://doi.org/10.1006/game.1994.1021

Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach.

Psychological Review, 102, 652–670. https://doi.org/10.1037/0033-295X.102.4.652

Funder, D. C., & Dobroth, K. M. (1987). Differences between traits: Properties associated with

interjudge agreement. Journal of Personality and Social Psychology, 52, 409–418.

https://doi.org/10.1037/0022-3514.52.2.409

Funder, D. C., Kolar, D. C., & Blackman, M. C. (1995). Agreement among judges of personality:

Interpersonal relations, similarity, and acquaintanceship. Journal of Personality and

Social Psychology, 69, 656–672. https://doi.org/10.1037/0022-3514.69.4.656

Page 51 of 69

Gebauer, J. E., Sedikides, C., Verplanken, B., & Maio, G. R. (2012). Communal narcissism.

Journal of Personality and Social Psychology, 103, 854–878.

https://doi.org/10.1037/a0029629

Gibson, H. B. (1962). The lie scale of the Maudsley Personality Inventory. Acta Psychologica,

20, 18–23. https://doi.org/10.1016/0001-6918(62)90003-3

Goffin, R. D., & Boyd, A. C. (2009). Faking and personality assessment in personnel selection:

Advancing models of faking. Canadian Psychology/Psychologie Canadienne, 50, 151–

160. https://doi.org/10.1037/a0015946

Griffith, R. L., Chmielowski, T., & Yoshita, Y. (2007). Do applicants fake? An examination of

the frequency of applicant faking behavior. Personnel Review, 36, 341–355.

https://doi.org/10.1108/00483480710731310

Grosz, M. P., Lösch, T., & Back, M. D. (2017). A dimension- and content-specific approach to

investigate the narcissism-overclaiming link. Journal of Research in Personality, 70, 134–

138. https://doi.org/10.1016/j.jrp.2017.05.006

Hartshorne, H., & May, M. (1928). Studies in deceit. New York, NY: MacMillan.

Hathaway, S. R., & McKinley, J. C. (1951). Minnesota Multiphasic Personality Inventory

manual, revised. San Antonio, TX: Psychological Corporation.

Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the

outcomes are known. Psychological Bulletin, 107, 311–327. https://doi.org/10.1037/0033-

2909.107.3.311

Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression

analyses of randomized response data. Journal of Statistical Software, 85, 1–29.

https://doi.org/10.18637/jss.v085.i02

Page 52 of 69

Heck, D. W., Thielmann, I., Moshagen, M., & Hilbig, B. E. (2018). Who lies? A large-scale

reanalysis linking basic personality traits to unethical decision making. Judgment and

Decision Making, 13, 356–371.

Hell, W., Gigerenzer, G., Gauggel, S., Mall, M., & Müller, M. (1988). Hindsight bias: An

interaction of automatic and motivational factors? Memory & Cognition, 16, 533–538.

https://doi.org/10.3758/BF03197054

Hilbig, B. E., & Hessler, C. M. (2013). What lies beneath: How the distance between truth and lie

drives dishonesty. Journal of Experimental Social Psychology, 49, 263–266.

https://doi.org/10.1016/j.jesp.2012.11.010

Hilbig, B. E., Thielmann, I., Wührl, J., & Zettler, I. (2015). From honesty-humility to fair

behavior – Benevolence or a (blind) fairness norm? Personality and Individual

Differences, 80, 91–95. https://doi.org/10.1016/j.paid.2015.02.017

Hoffmann, A., Diedenhofen, B., Verschuere, B., & Musch, J. (2015). A strong validation of the

crosswise model using experimentally-induced cheating behavior. Experimental

Psychology, 62, 403–414. https://doi.org/10.1027/1618-3169/a000304

Hoffrage, U., Hertwig, R., & Gigerenzer, G. (2000). Hindsight bias: A by-product of knowledge

updating? Journal of Experimental Psychology: Learning, Memory, and Cognition, 26,

566–581. https://doi.org/10.1037/0278-7393.26.3.566

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41–54.

https://doi.org/10.1007/BF02287965

Hoorens, V. (1995). Self‐favoring biases, self‐presentation, and the self‐other asymmetry in

social comparison. Journal of Personality, 63, 793–817. https://doi.org/10.1111/j.1467-

6494.1995.tb00317.x

Page 53 of 69

Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation

of suggested palliatives. Human Performance, 11, 209–244.

https://doi.org/10.1080/08959285.1998.9668032

Hough, L. M., & Oswald, F. L. (2008). Personality testing and industrial-organizational

psychology: Reflections, progress, and prospects. Industrial and Organizational

Psychology, 1, 272–290. https://doi.org/10.1111/j.1754-9434.2008.00048.x

Hülür, G., Wilhelm, O., & Schipolowski, S. (2011). Prediction of self-reported knowledge with

over-claiming, fluid and crystallized intelligence and typical intellectual engagement.

Learning and Individual Differences, 21, 742–746.

https://doi.org/10.1016/j.lindif.2011.09.006

Humberg, S., Dufner, M., Schönbrodt, F. D., Geukes, K., Hutteman, R., van Zalk, M. H. W., …

Back, M. D. (2018). Enhanced versus simply positive: A new condition-based regression

analysis to disentangle effects of self-enhancement from effects of positivity of self-view.

Journal of Personality and Social Psychology, 114, 303–322.

https://doi.org/10.1037/pspp0000134

Iacono, W. G. (2000). The detection of . In J. T. Cacioppo, L. G. Tassinary, & G. G.

Berntson (Eds.), Handbook of psychophysiology (pp. 772–793). New York, NY:

Cambridge University Press.

Iacono, W. G. (2008). Effective policing: Understanding how polygraph tests work and are used.

Criminal Justice and Behavior, 35, 1295–1308.

https://doi.org/10.1177/0093854808321529

Jackson, D. N., & Messick, S. (1958). Content and style in personality assessment. Psychological

Bulletin, 55, 243–252. https://doi.org/10.1037/h0045996

Page 54 of 69

Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and

perceptual learning. Journal of Experimental Psychology: General, 110, 306–340.

https://doi.org/10.1037/0096-3445.110.3.306

John, O. P., & Robins, R. W. (1993). Determinants of interjudge agreement on personality traits:

The Big Five domains, observability, evaluativeness, and the unique perspective of the

self. Journal of Personality, 61, 521–551. https://doi.org/10.1111/j.1467-

6494.1993.tb00781.x

John, O. P., & Robins, R. W. (1994). Accuracy and bias in self-perception: Individual differences

in self-enhancement and the role of narcissism. Journal of Personality and Social

Psychology, 66, 206–219. https:// doi.org/10.1037/0022-3514.66.1.206

Jones, E. E., & Sigall, H. (1971). The bogus pipeline: A new paradigm for measuring affect and

attitude. Psychological Bulletin, 76, 349–364. https://doi.org/10.1037/h0031617

Kam, C., Risavy, S. D., & Perunovic, W. E. (2015). Using over-claiming technique to probe

social desirability ratings of personality items: A validity examination. Personality and

Individual Differences, 74, 177–181. https://doi.org/10.1016/j.paid.2014.10.017

Kemper, C. J., & Menold, N. (2014). Nuisance or remedy? The utility of stylistic responding as

an indicator of data fabrication in surveys. Methodology, 10, 92–99.

https://doi.org/10.1027/1614-2241/a000078

Kleitman, S., & Stankov, L. (2007). Self-confidence and metacognitive processes. Learning and

Individual Differences, 17, 161–173. https://doi.org/10.1016/j.lindif.2007.03.004

Konstabel, K., Aavik, T., & Allik, J. (2006). Social desirability and consensual validity of

personality traits. European Journal of Personality, 20, 549–566.

https://doi.org/10.1002/per.593

Page 55 of 69

Krammer, G., Sommer, M., & Arendasy, M. E. (2017). The psychometric costs of applicants’

faking: Examining measurement invariance and retest correlations across response

conditions. Journal of Personality Assessment, 99, 510–523.

https://doi.org/10.1080/00223891.2017.1285781

Krumrei-Mancuso, E. J., Haggard, M. C., LaBouff, J. P., & Rowatt, W. C. (2019). Links between

intellectual humility and acquiring knowledge. The Journal of Positive Psychology.

Advance online publication. https://doi.org/10.1080/17439760.2019.1579359

Krumrei-Mancuso, E. J., & Rouse, S. V. (2016). The development and validation of the

Comprehensive Intellectual Humility Scale. Journal of Personality Assessment, 98, 209–

221. https://doi.org/10.1080/00223891.2015.1068174

Kurt, A., & Paulhus, D. L. (2008). Moderators of the adaptiveness of self-enhancement:

Operationalization, motivational domain, adjustment facet, and evaluator. Journal of

Research in Personality, 42, 839–853. https://doi.org/10.1016/j.jrp.2007.11.005

Kurtz, J. E., Tarquini, S. J., & Iobst, E. A. (2008). Socially desirable responding in personality

assessment: Still more substance than style. Personality and Individual Differences, 45,

22–27. https://doi.org/10.1016/j.paid.2008.02.012

Kwan, V. S. Y., John, O. P., Kenny, D. A., Bond, M. H., & Robins, R. W. (2004).

Reconceptualizing individual differences in self-enhancement bias: An interpersonal

approach. Psychological Review, 111, 94–110. https://doi.org/10.1037/0033-

295X.111.1.94

Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO personality

inventory. Multivariate Behavioral Research, 39, 329–358.

https://doi.org/10.1207/s15327906mbr3902_8

Page 56 of 69

Leising, D., Erbs, J., & Fritz, U. (2010). The letter of recommendation effect in informant ratings

of personality. Journal of Personality and Social Psychology, 98, 668–682.

https://doi.org/10.1037/a0018771

Leising, D., Gallrein, A. M. B., & Dufner, M. (2014). Judging the behavior of people we know:

Objective assessment, confirmation of preexisting views, or both? Personality and Social

Psychology Bulletin, 40, 153–163. https://doi.org/10.1177/0146167213507287

Leising, D., Locke, K. D., Kurzius, E., & Zimmermann, J. (2016). Quantifying the association of

self-enhancement bias with self-ratings of personality and life satisfaction. Assessment,

23, 588–602. https://doi.org/10.1177/1073191115590852

Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking about the same person

here?”: Interrater agreement in judgments of personality varies dramatically with how

much the perceivers like the targets. Social Psychological and Personality Science, 4,

468–474. https://doi.org/10.1177/1948550612462414

Li, A., & Bagger, J. (2006). Using the BIDR to distinguish the effects of impression management

and self-deception on the criterion validity of personality measures: A meta-analysis.

International Journal of Selection and Assessment, 14, 131–141.

https://doi.org/10.1111/j.1468-2389.2006.00339.x

Lönnqvist, J. E., Paunonen, S., Tuulio-Henriksson, A., Lönnqvist, J., & Verkasalo, M. (2007).

Substance and style in socially desirable responding. Journal of Personality, 75, 291–322.

https://doi.org/10.1111/j.1467-6494.2006.00440.x

Lück, H. E., & Timaeus, E. (1969). Skalen zur Messung manifester Angst (MAS) und sozialer

Wünschbarkeit (SDS-E und SDS-CM) [Scales for the measurement of manifest anxiety

(MAS) and social desirability (SDS-E and SDS-CM)]. Diagnostica, 15, 134–141.

Page 57 of 69

Ludeke, S. G., & Makransky, G. (2016). Does the Over-Claiming Questionnaire measure

overclaiming? Absent convergent validity in a large community sample. Psychological

Assessment, 28, 765–774. https://doi.org/10.1037/pas0000211

MacCann, C., Pearce, N., & Jiang, Y. (2017). The general factor of personality is stronger and

more strongly correlated with cognitive ability under instructed faking. Journal of

Individual Differences, 38, 46–54. https://doi.org/10.1027/1614-0001/a000221

Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.).

Mahwah, NJ: Erlbaum.

McAbee, S. T., & Connelly, B. S. (2016). A multi-rater framework for studying personality: The

trait-reputation-identity model. Psychological Review, 123, 569–591.

https://doi.org/10.1037/rev0000035

McCrae, R. R. (1987). Creativity, divergent thinking, and openness to experience. Journal of

Personality and Social Psychology, 52, 1258–1265. https://doi.org/10.1037/0022-

3514.52.6.1258

McCrae, R. R., & Costa, P. T. (1983). Social desirability scales: More substance than style.

Journal of Consulting and Clinical Psychology, 51, 882–888.

https://doi.org/10.1037/0022-006X.51.6.882

Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppressor variable in the Minnesota

Multiphasic Personality Inventory. Journal of Applied Psychology, 30, 525–564.

https://doi.org/10.1037/h0053634

Mesmer-Magnus, J., Viswesvaran, C., Deshpande, S., & Joseph, J. (2006). Social desirability:

The role of over-claiming, self-esteem, and emotional intelligence. Psychology Science,

48, 336–356.

Page 58 of 69

Miller, J. D., Pilkonis, P. A., & Clifton, A. (2005). Self- and other-reports of traits from the five-

factor model: Relations to personality disorder. Journal of Personality Disorders, 19,

400–419. https://doi.org/10.1521/pedi.2005.19.4.400

Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior

Research Methods, 49, 724–732. https://doi.org/10.3758/s13428-016-0729-x

Moshagen, M., Hilbig, B. E., Erdfelder, E., & Moritz, A. (2014). An experimental validation

method for questioning techniques that assess sensitive issues. Experimental Psychology,

61, 48–54. https://doi.org/10.1027/1618-3169/a000226

Moshagen, M., Hilbig, B. E., & Zettler, I. (2014). Faktorenstruktur, psychometrische

Eigenschaften und Messinvarianz der deutschsprachigen Version des 60-Item HEXACO

Persönlichkeitsinventars [Factor structure, psychometric properties, and measurement

invariance of the German-language version of the 60-item HEXACO personality

inventory]. Diagnostica, 60, 86–97. https://doi.org/10.1026/0012-1924/a000112

Moshagen, M., Musch, J., & Erdfelder, E. (2012). A stochastic lie detector. Behavior Research

Methods, 44, 222–231. https://doi.org/10.3758/s13428-011-0144-2

Murphy, S. C., von Hippel, W., Dubbs, S. L., Angilletta, M. J., Jr., Wilson, R. S., Trivers, R., &

Barlow, F. K. (2015). The role of overconfidence in romantic desirability and

competition. Personality and Social Psychology Bulletin, 41, 1036–1052.

https://doi.org/10.1177/0146167215588754

Musch, J. (2003). Personality differences in hindsight bias. Memory, 11, 473–489.

https://doi.org/10.1080/09658210244000540

Musch, J., Brockhaus, R., & Bröder, A. (2002). Ein Inventar zur Erfassung von zwei Faktoren

sozialer Erwünschtheit [An inventory for the assessment of two factors of socially

Page 59 of 69

desirable responding]. Diagnostica, 48, 121–129. https://doi.org/10.1026//0012-

1924.48.3.121

Musch, J., Ostapczuk, M., & Klaiber, Y. (2012). Validating an inventory for the assessment of

egoistic bias and moralistic bias as two separable components of social desirability.

Journal of Personality Assessment, 94, 620–629.

https://doi.org/10.1080/00223891.2012.672505

Nezlek, J. B., Schütz, A., & Sellin, I. (2007). Self-presentational success in daily social

interaction. Self and Identity, 6, 361–379. http://dx.doi.org/10.1080/15298860600979997

Oh, I. S., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model

of personality traits: A meta-analysis. Journal of Applied Psychology, 96, 762–773.

https://doi.org/10.1037/a0021832

O’Neill, T. A., Lee, N. M., Radan, J., Law, S. J., Lewis, R. J., & Carswell, J. J. (2013). The

impact of “non-targeted traits” on personality test faking, hiring, and workplace deviance.

Personality and Individual Differences, 55, 162–168.

https://doi.org/10.1016/j.paid.2013.02.027

Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality

testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660–

679. https://doi.org/10.1037/0021-9010.81.6.660

Organisation for Economic Co-operation and Development. (2014). PISA 2012 technical report.

Paris, France: OECD Publishing. Retrieved from

https://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf

Pallier, G., Wilkinson, R., Danthir, V., Kleitman, S., Knezevic, G., Stankov, L., & Roberts, R. D.

(2002). The role of individual differences in the accuracy of confidence judgments.

Page 60 of 69

Journal of General Psychology, 129, 257–299.

https://doi.org/10.1080/00221300209602099

Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of

Personality and Social Psychology, 46, 598–609. https://doi.org/10.1037/0022-

3514.46.3.598

Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R. Shaver,

& L. S. Wrightsman (Eds.), Measures of social psychological attitudes. Measures of

personality and social psychological attitudes (pp. 17–59). San Diego, CA: Academic

Press.

Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun,

D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and

educational measurement (pp. 49–69). Mahwah, NJ: Erlbaum.

Paulhus, D. L. (2011). Overclaiming on personality questionnaires. In M. Ziegler, C. MacCann,

& R. D. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 151–

164). Oxford, NY: Oxford University Press.

Paulhus, D. L., & Harms, P. D. (2004). Measuring cognitive ability with the overclaiming

technique. Intelligence, 32, 297–314. https://doi.org/10.1016/j.intell.2004.02.001

Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming technique:

Measuring self-enhancement independent of ability. Journal of Personality and Social

Psychology, 84, 890–904. https://doi.org/10.1037/0022-3514.84.4.890

Paulhus, D. L., & Holden, R. R. (2010). Measuring self-enhancement: From self-report to

concrete behavior. In C. R. Agnew (Ed.), Then a miracle occurs. Focusing on behavior in

social psychological theory and research; Purdue symposium on psychological

Page 61 of 69

(pp. 227–246). Oxford, NY: Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780195377798.003.0012

Paulhus, D. L., & John, O. P. (1998). Egoistic and moralistic biases in self‐perception: The

interplay of self‐deceptive styles with basic traits and motives. Journal of Personality, 66,

1025–1060. https://doi.org/10.1111/1467-6494.00041

Paulhus, D. L., & Vazire, S. (2007). The self-report method. In R. W. Robins, R. C. Fraley, & R.

Krueger (Eds.), Handbook of research methods in personality psychology (pp. 224–239).

New York, NY: Guilford Press.

Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2015). On the

reception and detection of pseudo-profound bullshit. Judgment and Decision Making, 10,

549–563.

Pennycook, G., & Rand, D. G. (2019). Who falls for fake news? The roles of bullshit receptivity,

overclaiming, familiarity, and analytic thinking. Journal of Personality. Advance online

publication. https://doi.org/10.1111/jopy.12476

Phillips, D. L., & Clancy, K. J. (1972). Some effects of “social desirability” in survey studies.

American Journal of Sociology, 77, 921–940. https://doi.org/10.1086/225231

Piedmont, R. L., McCrae, R. R., Riemann, R., & Angleitner, A. (2000). On the invalidity of

validity scales: Evidence from self-reports and observer ratings in volunteer samples.

Journal of Personality and Social Psychology, 78, 582–593.

https://doi.org/10.1037//0022-3514.78.3.582

Pohl, R. F. (2007). Ways to assess hindsight bias. Social Cognition, 25, 14–31.

https://doi.org/10.1521/soco.2007.25.1.14

Pohl, R. F., Bayen, U. J., & Martin, C. (2010). A multiprocess account of hindsight bias in

children. Developmental Psychology, 46, 1268–1282. https://doi.org/10.1037/a0020209

Page 62 of 69

Pohl, R. F., & Hell, W. (1996). No reduction in hindsight bias after complete information and

repeated testing. Organizational Behavior and Human Decision Processes, 67, 49–58.

https://doi.org/10.1006/obhd.1996.0064

Reber, R., & Schwarz, N. (1999). Effects of perceptual fluency on judgments of truth.

Consciousness and Cognition: An International Journal, 8, 338–342.

https://doi.org/10.1006/ccog.1999.0386

Reber, R., & Zupanek, N. (2002). Effects of processing fluency on estimates of probability and

frequency. In P. Sedlmeier & T. Betsch (Eds.), ETC. Frequency processing and cognition

(pp. 175–188). New York, NY: Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780198508632.003.0011

Riefer, D. M., & Batchelder, W. H. (1988). Multinomial modeling and the measurement of

cognitive processes. Psychological Review, 95, 318–339. https://doi.org/10.1037/0033-

295X.95.3.318

Roese, N. J., & Jamieson, D. W. (1993). Twenty years of bogus pipeline research: A critical

review and meta-analysis. Psychological Bulletin, 114, 363–375.

https://doi.org/10.1037/0033-2909.114.2.363

Roese, N. J., & Vohs, K. D. (2012). Hindsight bias. Perspectives on Psychological Science, 7,

411–426. https://doi.org/10.1177/1745691612454303

Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selection:

What does current research support? Human Resource Management Review, 16, 155–180.

https://doi.org/10.1016/j.hrmr.2006.03.004

Roulin, N., & Bourdage, J. S. (2017). Once an impression manager, always an impression

manager? Antecedents of honest and deceptive impression management use and

Page 63 of 69

variability across multiple job interviews. Frontiers in Psychology, 8, 29.

https://doi.org/10.3389/fpsyg.2017.00029

Sackeim, H. A., & Gur, R. C. (1978). Self-deception, other-deception, and consciousness. In G.

E. Schwartz & D. Shapiro (Eds.), Consciousness and self-regulation: Advances in

research (Vol. 2, pp. 139–197). New York, NY: Plenum.

Schaefer, P. S., Williams, C. C., Goodie, A. S., & Campbell, W. K. (2004). Overconfidence and

the Big Five. Journal of Research in Personality, 38, 473–480.

https://doi.org/10.1016/j.jrp.2003.09.010

Schlenker, B. R. (1975). Self-presentation: Managing the impression of consistency when reality

interferes with self-enhancement. Journal of Personality and Social Psychology, 32,

1030–1037. https://doi.org/10.1037/0022-3514.32.6.1030

Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A. (1991).

Ease of retrieval as information: Another look at the availability heuristic. Journal of

Personality and Social Psychology, 61, 195–202. https://doi.org/10.1037/0022-

3514.61.2.195

Schwarz, S., & Stahlberg, D. (2003). Strength of hindsight bias as a consequence of meta-

cognitions. Memory, 11, 395–410. https://doi.org/10.1080/09658210244000496

Shalvi, S., Handgraaf, M. J. J., & De Dreu, C. K. W. (2011). Ethical manoeuvring: Why people

avoid both major and minor lies. British Journal of Management, 22, 16–27.

https://doi.org/10.1111/j.1467-8551.2010.00709.x

Silvia, P. J., Nusbaum, E. C., Berg, C., Martin, C., & O’Connor, A. (2009). Openness to

experience, plasticity, and creativity: Exploring lower-order, high-order, and interactive

effects. Journal of Research in Personality, 43, 1087–1090.

https://doi.org/10.1016/j.jrp.2009.04.015

Page 64 of 69

Stanovich, K. E., & West, R. F. (1998). Individual differences in rational thought. Journal of

Experimental Psychology: General, 127, 161–188. https://doi.org/10.1037/0096-

3445.127.2.161

Swets, J. A. (1964). Signal detection and recognition by human observers: Contemporary

readings. New York, NY: Wiley.

Teovanović, P., Knežević, G., & Stankov, L. (2015). Individual differences in cognitive biases:

Evidence against one-factor theory of rationality. Intelligence, 50, 75–86.

https://doi.org/10.1016/j.intell.2015.02.008

Thielmann, I., Heck, D. W., & Hilbig, B. E. (2016). Anonymity and incentives: An investigation

of techniques to reduce socially desirable responding in the Trust Game. Judgment and

Decision Making, 11, 527–536.

Thielmann, I., & Hilbig, B. E. (2018). Is it all about the money? A re-analysis of the link between

honesty-humility and dictator game giving. Journal of Research in Personality, 76, 1–5.

https://doi.org/10.1016/j.jrp.2018.07.002

Thielmann, I., Zimmermann, J., Leising, D., & Hilbig, B. E. (2017). Seeing is knowing: On the

predictive accuracy of self- and informant reports for prosocial and moral behaviours.

European Journal of Personality, 31, 404–418. https://doi.org/10.1002/per.2112

Tonković, M., Galić, Z., & Jerneić, Ž. (2011). The construct validity of over-claiming as a

measure of egoistic enhancement. Review of Psychology, 18, 13–21.

Tracy, J. L., Cheng, J. T., Robins, R. W., & Trzesniewski, K. H. (2009). Authentic and hubristic

pride: The affective core of self-esteem and narcissism. Self and Identity, 8, 196–213.

https://doi.org/10.1080/15298860802505053

Page 65 of 69

Uziel, L. (2010). Rethinking social desirability scales: From impression management to

interpersonally oriented self-control. Perspectives on Psychological Science, 5, 243–262.

https://doi.org/10.1177/1745691610369465

Uziel, L. (2014). Impression management (“lie”) scales are associated with interpersonally

oriented self‐control, not other‐deception. Journal of Personality, 82, 200–212.

https://doi.org/10.1111/jopy.12045

Vazire, S. (2010). Who knows what about a person? The self-other knowledge asymmetry

(SOKA) model. Journal of Personality and Social Psychology, 98, 281–300.

https://doi.org/10.1037/a0017908

Vazire, S., & Carlson, E. N. (2011). Others sometimes know us better than we know ourselves.

Current Directions in Psychological Science, 20, 104–108.

https://doi.org/10.1177/0963721411402478

Vazire, S., & Mehl, M. R. (2008). Knowing me, knowing you: The accuracy and unique

predictive validity of self-ratings and other-ratings of daily behavior. Journal of

Personality and Social Psychology, 95, 1202–1216. https://doi.org/10.1037/a0013314

Vonkova, H., Papajoanu, O., & Stipek, J. (2018). Enhancing the cross-cultural comparability of

self-reports using the overclaiming technique: An analysis of accuracy and exaggeration

in 64 cultures. Journal of Cross-Cultural Psychology, 49, 1247–1268.

https://doi.org/10.1177/0022022118787042

Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer

bias. Journal of the American Statistical Association, 60, 63–69.

Werth, L., & Strack, F. (2003). An inferential approach to the knew-it-all-along phenomenon.

Memory, 11, 411–419. https://doi.org/10.1080/09658210244000586

Page 66 of 69

Wessels, N. M., Zimmermann, J., Biesanz, J. C., & Leising, D. (2018). Differential associations

of knowing and liking with accuracy and positivity bias in person perception. Journal of

Personality and Social Psychology. Advance online publication.

https://doi.org/10.1037/pspp0000218

Whittlesea, B. W. A., & Williams, L. D. (2000). The source of feelings of familiarity: The

discrepancy-attribution hypothesis. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 26, 547–565. https://doi.org/10.1037/0278-7393.26.3.547

Whittlesea, B. W. A., & Williams, L. D. (2001). The discrepancy-attribution hypothesis: I. The

heuristic basis of feelings and familiarity. Journal of Experimental Psychology: Learning,

Memory, and Cognition, 27, 3–13. https://doi.org/10.1037/0278-7393.27.1.3

Wiggins, J. S. (1959). Interrelationships among MMPI measures of dissimulation under standard

and social desirability instruction. Journal of Consulting Psychology, 23, 419–427.

https://doi.org/10.1037/h0047823

Yu, J. W., Tian, G. L., & Tang, M. L. (2008). Two new models for survey sampling with

sensitive characteristic: Design and analysis. Metrika, 67, 251–263.

https://doi.org/10.1007/s00184-007-0131-x

Zettler, I., Hilbig, B. E., Moshagen, M., & de Vries, R. E. (2015). Dishonest responding or true

virtue? A behavioral test of impression management. Personality and Individual

Differences, 81, 107–111. https://doi.org/10.1016/j.paid.2014.10.007

Zhao, K., & Smillie, L. D. (2015). The role of interpersonal traits in social decision making:

Exploring sources of behavioral heterogeneity in economic games. Personality and Social

Psychology Review, 19, 277–302. https://doi.org/10.1177/1088868314553709

Page 67 of 69

Ziegler, M., Kemper, C., & Rammstedt, B. (2013). The Vocabulary and Overclaiming Test

(VOC-T). Journal of Individual Differences, 34, 32–40. https://doi.org/10.1027/1614-

0001/a000093

Zimmermann, J., Schindler, S., Klaus, G., & Leising, D. (2018). The effect of dislike on accuracy

and bias in person perception. Social Psychological and Personality Science, 9, 80–88.

https://doi.org/10.1177/1948550617703167

Page 68 of 69

Appendices

In the following, the three peer-reviewed articles that form the basis of this dissertation are presented in chronological order of Studies 1–3.

Study 1:

Müller, S., & Moshagen, M. (2019). Controlling for response bias in self-ratings of personality:

A comparison of impression management scales and the overclaiming technique. Journal

of Personality Assessment, 101, 229–236.

https://doi.org/10.1080/00223891.2018.1451870

Study 2:

Müller, S., & Moshagen, M. (2019). True virtue, self-presentation, or both?: A behavioral test of

impression management and overclaiming. Psychological Assessment, 31, 181–191.

https://doi.org/10.1037/pas0000657

Study 3:

Müller, S., & Moshagen, M. (2018). Overclaiming shares processes with the hindsight bias.

Personality and Individual Differences, 134, 298–300.

https://doi.org/10.1016/j.paid.2018.06.035

Page 69 of 69 Study 1:

Müller, S., & Moshagen, M. (2019). Controlling for response bias in self-ratings of

personality: A comparison of impression management scales and the overclaiming

technique. Journal of Personality Assessment, 101, 229–236.

https://doi.org/10.1080/00223891.2018.1451870

© 2018 The Author(s). Published with license by Taylor & Francis Group, LLC.

This is an Open Access article distributed under the terms of the Creative Commons

Attribution-NonCommercial-NoDerivatives License

(http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. JOURNAL OF PERSONALITY ASSESSMENT 2019, VOL. 101, NO. 3, 229–236 https://doi.org/10.1080/00223891.2018.1451870

Controlling for Response Bias in Self-Ratings of Personality: A Comparison of Impression Management Scales and the Overclaiming Technique

Sascha Muller€ and Morten Moshagen Institute of Psychology and Education, Ulm University, Ulm,

ABSTRACT ARTICLE HISTORY Self-serving response distortions pose a threat to the validity of personality scales. A common Received 17 June 2017 approach to deal with this issue is to rely on impression management (IM) scales. More recently, Revised 20 January 2018 the overclaiming technique (OCT) has been proposed as an alternative and arguably superior measure of such biases. In this study (N ¼ 162), we tested these approaches in the context of self- and other-ratings using the HEXACO personality inventory. To the extent that the OCT and IM scales can be considered valid measures of response distortions, they are expected to account for inflated self-ratings in particular for those personality dimensions that are prone to socially desirable responding. However, the results show that neither the OCT nor IM account for overly favorable self-ratings. The validity of IM as a measure of response biases was further scrutinized by a substantial correlation with other-rated honesty-humility. As such, this study questions the use of both the OCT and IM to assess self-serving response distortions.

Researchers using personality inventories often face the 2014; Zettler, Hilbig, Moshagen, & de Vries, 2015). Indeed, problem that respondents might distort self-reports to pre- Paulhus (2011) more recently recommended against using IM sent themselves in a more desirable manner (e.g., Goffin & scales to control self-presentation on self-reports of personal- Boyd, 2009; Hough, 1998). Such self-favoring response dis- ity and introduced the overclaiming technique (OCT; tortions pose a threat to the validity of a personality assess- Paulhus, 2011; Paulhus, Harms, Bruce, & Lysy, 2003)asa ment relying on self-reports. A common approach to deal superior approach. Nonetheless, IM remains a popular and with social desirability bias is the use of impression manage- widely used construct. ment (IM) scales (e.g., the respective subscale of the The OCT attempts to overcome the inherent problem of Balanced Inventory of Desirable Responding [BIDR]; IM scales by assessing deviations of self-reports from a Paulhus, 1991). IM scales rest on the idea that individuals known criterion value. As such, this technique is intended as either claiming highly desirable (but rare) or discarding a behavioral method rather than a questionnaire (Paulhus & highly undesirable (but common) attributes are likely to pre- Holden, 2010). Overclaiming represents the tendency to over- sent themselves overly favorably, so that the extent of state one’s knowledge by claiming to be familiar with fact- response bias can be determined and, in turn, statistic- ually nonexistent terms (Phillips & Clancy, 1972). Unlike IM ally controlled. scales, the OCT exhibits a less clear pattern of associations However, IM scales have been shown to be substantially with broad dimensions of personality, with the most consist- related to self-reported personality traits, in particular to emo- ent finding being an association with openness (Tonkovic, tional stability, conscientiousness, and agreeableness (for a Galic, & Jerneic, 2011), due to which Dunlop et al. (2017) meta-analysis, see Ones, Viswesvaran, & Reiss, 1996). These interpreted overclaiming as “a result of knowledge accumu- results might suggest that IM scores seem to be confounded lated through a general proclivity for cognitive and aesthetic with features of personality; that is, that they do not only rep- exploration” (p. 810). Nonetheless, the OCT has been resent response distortions (style), but also carry substantive reported to be positively associated with measures of self- trait-like information (substance). Thus, IM scores by them- presentation; that is, narcissism and IM (Paulhus et al., 2003; selves cannot unambiguously indicate whether individuals Randall & Fernandes, 1991; Tracy, Cheng, Robins, & provide distorted self-descriptions or whether they truly pos- Trzesniewski, 2009; but see Mesmer-Magnus, Viswesvaran, sess virtues. In fact, some have argued that IM scales primar- Deshpande, & Joseph, 2006). In addition, the OCT increased ily reflect the latter, such as interpersonally oriented self- the validity of achievement striving as a facet of Big Five con- control (Uziel, 2010) or honesty (de Vries, Zettler, & Hilbig, scientiousness for predicting the grade-point average as a

CONTACT Sascha Muller€ [email protected] Institute of Psychology and Education, Ulm University, Albert-Einstein-Allee 47, 89081, ULM Germany ß 2019 the Author(s). Published with license by Taylor & Francis Group, LLC. This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License (http://www.creativecommons.org/licenses/by-nc-nd/4.0/) which permits non-commercial use, reproduction and distribution of the work as published without adaptation or alteration, without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 230 MULLER€ AND MOSHAGEN measure of academic performance (Bing, Kluemper, Davison, controlling for IM to reduce self-other discrepancies of the Taylor, & Novicevic, 2011) and distinguished between faked Big Five only for neuroticism. In contrast, in the only study and genuine interviews about attitudes, behavior, and person- so far investigating overclaiming in the context of self- and ality (Kemper & Menold, 2014). However, the OCT does not other-ratings, overclaiming predicted inflated self-reports of seem to be capable of predicting social desirability ratings of a factor primarily made up of extraversion and openness personality items (Kam, Risavy, & Perunovic, 2015). (Paulhus et al., 2003). Overclaiming has also been found to be associated with indi- As of now, no study has directly compared IM scales and cators of careless responding (Barber, Barnes, & Carlson, the OCT as a means to account for discrepancies in self- 2013; Ludeke & Makransky, 2016). Likewise, it was recently and other-reports. Furthermore, all of the aforementioned shown that overclaiming does not share a common core or studies were based on the Big Five model of personality. nomological network with other measures of socially desir- However, the honesty-humility factor of the HEXACO able responding (Bensch, Paulhus, Stankov, & Ziegler, in model of personality (Ashton & Lee, 2009; Lee & Ashton, press). In light of these discrepant findings, it should be fur- 2004) appears to be especially interesting in the context of ther examined whether the OCT can be seen as a measure of socially desirable responding, given that it bears close self-serving response biases. resemblance to IM scales. In fact, IM and honesty-humility Whereas the OCT relies on an objective criterion, it is frequently exhibit substantially positive associations around ¼ more difficult to obtain such a criterion for personality rat- r .50 (Dunlop et al., 2017; Zettler et al., 2015). Similar to ings. One approach is to determine the deviation of self-rat- IM scales where the respondent can either claim or deny ings from other-ratings of personality. Self-other knowledge the possession of highly desirable or undesirable traits, items of personality exhibits obvious asymmetries (Vazire, 2010) of the honesty-humility scale also ask whether one would with correlations between self- and other-ratings in contem- steal, accept a bribe, or use instrumental flattery (Ashton & porary models of personality (e.g., the Big Five or the Lee, 2009). Thus, self-reported honesty-humility should be HEXACO model) typically averaging around .50 (Ashton & particularly prone to self-serving response biases. Lee, 2009; Connelly & Ones, 2010; Connolly, Kavanagh, & The aim of this study was to combine the two approaches — Viswesvaran, 2007). These discrepancies between self- and of validating self-reports measures of response bias and — other-reports of personality are due to a number of factors, other-reports to examine the validity of purported measures the most important one arguably being differing informa- of response distortions. To the extent that the OCT and IM tional foundations (Kandler, 2012; Vazire, 2010). Another scales can be considered as reflecting response distortions, it possible reason for the observed differences in self- and is to be expected that both account for deviations between other perception are response distortions in self-ratings due self- and other-ratings in that they are associated with overly to socially desirable responding (Connolly et al., 2007; favorable self-ratings. In particular, our prime interest lay in Connelly & Ones, 2010; Funder, 1995). Given that observers those HEXACO dimensions that arguably entail the highest evaluative component, honesty-humility and agreeableness. usually lack motives for deliberate distortion, other-reports Whereas the case for honesty-humility seems to be quite should be less prone to response biases. Indeed, evidence obvious, agreeableness appears to be the most evaluative trait suggests that well-acquainted observers are able to provide in the sense that strong social norms or a high social value is accurate ratings of personality (e.g., Kolar, Funder, & placed on it (Funder, 1995). In line with this reasoning, Colvin, 1996; McCrae & Weiss, 2007) and that other-reports agreeableness shows the highest self-other discrepancies increase the validity in predicting job performance (Mount, among the dimensions of the Big Five (Connelly & Ones, Barrick, & Strauss, 1994; Oh, Wang, & Mount, 2011) or fea- 2010). Moreover, when assessed in an application context, tures of personality disorders (Miller, Pilkonis, & Clifton, individuals assign themselves higher scores than nonappli- 2005). Thus, other-reports have the potential to expose cants on the HEXACO dimensions of honesty-humility, motivated distortions of self-rated personality (Vazire & agreeableness, extraversion, and conscientiousness (Anglim, Carlson, 2011) and can therefore be used as a testbed to Morse, de Vries, MacCann, & Marty, 2017). evaluate measures attempting to assess response distortions. If a measure is indeed suitable for controlling self-favoring response biases, it should at least partially account for differ- Method ences in self- and other-reports about personality. Participants and procedure Existing studies concerning the ability of IM scales to account for the discrepancy between self- and observer The study has been approved by the local ethics committee reports provided only scattered evidence in favor of IM and was conducted online in close agreement with contem- scales. For example, Piedmont, McCrae, Riemann, and porary standards of online experimenting (Reips, 2002). Angleitner (2000) did not find IM scales to account for Dyad partners took part independently of each other. The shared variance between self- and other-reports of personal- first partner forwarded the link to the study’s homepage to ity. Similarly, Borkenau and Ostendorf (1992) neither found his or her partner upon completion of the study. Thus, moderator nor suppression effects of various IM scales. dyad members had no access to their partner’s ratings. After Borkenau and Zaltauskas (2009) reported that IM did not providing informed consent, participants completed the predict the similarity of self- and other-rated personality overclaiming task, the HEXACO self-report, the BIDR, and profiles. Konstabel, Aavik, and Allik (2006) found the HEXACO observer-report. The study closed with RESPONSE BIAS IN SELF-RATINGS OF PERSONALITY 231

Table 1. Self-other correlations and intercorrelations of the HEXACO personality factors, impression management (IM), and overclaiming index (OCI). Factor MSDa 12345678 Ã Ã Ã 1. Honesty-Humility 3.37 = 3.38 0.59 = 0.65 .71 = .81 .46 .14 .01 .34 À.02 .07 .36 À.02 Ã Ã 2. Emotionality 3.42 = 3.36 0.65 = 0.58 .81 = .80 .05 .55 À.13 À.10 .13 .02 .11 À.17 Ã 3. Extraversion 3.46 = 3.51 0.57 = 0.57 .80 = .79 .09 À.03 .59 .12 .01 .07 À.03 .04 Ã Ã Ã Ã Ã 4. Agreeableness 3.12 = 3.32 0.53 = 0.63 .73 = .84 .22 À.21 .21 .54 À.08 .12 .17 À.11 Ã Ã 5. Conscientiousness 3.61 = 3.57 0.58 = 0.70 .81 = .87 À.04 .27 .06 À.08 .52 À.14 .05 À.04 Ã Ã 6. Openness 3.50 = 3.36 0.59 = 0.59 .74 = .76 .11 À.11 .19 .10 À.13 .55 À.07 .05 Ã Ã Ã 7. IM 2.79 = — 0.55 = — .66 = — .49 .20 .05 .24 .12 À.01 —— Ã Ã Ã 8. OCI À0.18 = — 0.40 = ——À.04 À.19 À.03 À.03 À.01 .16 À.17 — Note. N ¼ 162. M ¼ mean (self-ratings = other-ratings); SD ¼ standard deviation (self-ratings = other-ratings); a ¼ Cronbach’s alpha estimate of internal consist- ency (self-ratings = other-ratings). Self-other correlations are presented on the diagonal (bold). Intercorrelations between the self-rated (other-rated) factors are Ãpresented below (above) the diagonal. p < .05. demographic background information and questions regard- HEXACO ing the dyad partners’ relationship. Note that both IM and We used the German version (Moshagen, Hilbig, & Zettler, overclaiming appear to be stable across administration 2014) of the HEXACO–60 (Ashton & Lee, 2009) to assess modes (Dodou & de Winter, 2014; Paulhus & Harms, 2004). the six personality dimensions honesty-humility, emotional- Participants were recruited at a medium-sized German ity, extraversion, agreeableness, conscientiousness, and open- university via mailing lists, social media, and flyers on cam- ness (10 items each). The measure is available both as a self- Ã pus. An a priori power analysis using G Power (Faul, report and as an other report form. Items were answered on Erdfelder, Lang, & Buchner, 2007) indicated that 150 partici- a 5-point Likert scale. For self-reports, Cronbach’s alpha pants (i.e., 75 dyads) are required to detect a medium-sized ranged from .71 (honesty-humility) to .81 (emotionality and effect (corresponding to a standardized regression slope of b conscientiousness). For other-reports, internal consistencies ¼ .20) with a power of .80 given an alpha level of .05. There ranged from .76 (openness) to .87 (conscientiousness). Thus, were no missing data, as participants were technically internal consistencies were similar to other studies (Ashton required to provide a response before proceeding. The drop- & Lee, 2009; Moshagen et al., 2014). out rate was 22.8%. If one dyad member did not complete the study, the dyad was excluded from further analyses. The Overclaiming final sample included 162 individuals forming 81 dyads. Of the participants, 113 (70%) were female and 49 (30%) were To assess overclaiming, we used a set of 60 terms spanning male. Age ranged from 19 to 46 (M ¼ 24.7, SD ¼ 3.74). All three domains (i.e., the arts and philosophy, natural scien- participants indicated native or fluent German language ces, and social sciences; 20 terms each) allegedly concerning skills. Participants received a flat fee of e5 as compensation academic and everyday knowledge. Items of each domain for completing the study. comprised 12 targets (factually existing terms; e.g., categor- The dyads comprised well-acquainted individuals. Mean ical imperative) and 8 lures (nonexistent terms; e.g., paradox relationship duration was 57.2 months (SD ¼ 69.6). Ninety- of duplicity). The item set was based on items used in one (56%) participants reported being in touch with their related studies (Dunlop et al., 2017; Musch, Ostapczuk, & dyad partner at least on a daily basis and another 37% at Klaiber, 2012). Item lists were presented in random order, as were the least several times a week. Eighty-three (51%) participants terms within each domain. The measure was introduced as attended the study with a friend; 55 (34%) with their a general knowledge test. Participants were instructed to romantic partner; 13 (8%) with a colleague, fellow student, indicate for each item if they knew the term or not. In most or schoolmate; and 11 (7%) with a relative. Participants implementations of the OCT, participants are asked to rate indicated on a 10-point scale that they knew their dyad part- their familiarity with each item on a continuous scale. ner rather well (M ¼ 7.73, SD ¼ 1.75) and that they were Because the resulting scores are then dichotomized to calcu- quite confident regarding their ratings of the dyad partner’s late indexes of response bias (e.g., Paulhus et al., 2003), we personality (M ¼ 6.71, SD ¼ 1.98). opted to directly use a dichotomous scoring procedure.

Materials Results Balanced inventory of desirable responding Descriptive results The German 20-item version of the BIDR (Musch, Brockhaus, & Broder,€ 2002; Paulhus, 1991) assesses self- As is typical in research comparing self- with other-reports deceptive enhancement and IM, respectively, with 10 items of personality (e.g., Ashton & Lee, 2009; Connelly & Ones, each. All items were rated on a 5-point Likert scale. 2010), the correlations between self- and other-ratings were Cronbach’s alpha of the IM scale was .66, which is in line around .50 (see Table 1), ranging from r ¼ .46 for honesty- with previous research (e.g., Borkenau & Zaltauskas, 2009; humility to r ¼ .59 for extraversion. In the overclaiming Musch et al., 2002; Paulhus, 1991). task, the mean hit rate was .68 (SD ¼ .12) and the mean 232 MULLER€ AND MOSHAGEN

Table 2. Linear mixed models predicting impression management (IM)=overclaiming index (OCI) by self- and other-ratings of personality. 2 2 Model=estimate SOSþ OS– O abs ICC c0ðÞb0 r0ðb0Þ re Overclaiming à Honesty-humility À.04 (.07) .00 (.05) À.04 (.06) À.04 (.11) .00 (.10) .20 À0.05 (.18) .03 (.02) .13 (.02) † à à Emotionality À.09 (.05) À.04 (.07) À.13 (.06) À.05 (.10) À.08 (.13) .15 0.26 (.21) .02 (.02) .13 (.02) à Extraversion À.07 (.06) .07 (.06) .01 (.05) À.14 (.11) .13 (.11) .20 À0.20 (.18) .03 (.02) .13 (.02) à Agreeableness .03 (.06) À.07 (.05) À.04 (.06) .10 (.10) .06 (.13) .17 À0.06 (.18) .03 (.02) .13 (.02) à Conscientiousness .01 (.05) À.02 (.05) À.02 (.05) .03 (.09) .01 (.10) .19 À0.12 (.20) .03 (.02) .13 (.02) à † à à Openness .13 (.06) À.04 (.06) .10 (.06) .17 (.11) .07 (.13) .19 À0.53 (.20) .03 (.02) .13 (.02) Impression management à à à à à à à Honesty-humility .39 (.08) .14 (.07) .52 (.07) .25 (.12) À.27 (.13) .10 1.03 (.24) .03 (.03) .19 (.04) à à à Emotionality .15 (.07) À.02 (.10) .13 (.10) .17 (.14) .04 (.20) .14 2.33 (.34) .04 (.04) .25 (.05) à † à Extraversion .11 (.08) À.09 (.08) .02 (.09) .20 (.13) .18 (.16) .20 2.71 (.32) .06 (.03) .24 (.04) à à à à Agreeableness .23 (.08) .03 (.07) .26 (.07) .20 (.14) À.06 (.14) .18 1.98 (.21) .05 (.03) .23 (.04) à à Conscientiousness .12 (.09) À.03 (.07) .10 (.08) .15 (.13) .05 (.13) .18 2.44 (.30) .05 (.03) .24 (.04) à à Openness .04 (.07) À.07 (.08) À.03 (.08) .11 (.13) .08 (.14) .18 2.89 (.25) .05 (.03) .24 (.04) Note. N ¼ 162. S ¼ self-rating regression coefficient; O ¼ other rating regression coefficient; abs ¼jS – Oj – jS þ Oj. Presented are the fixed effects for 12 sep- arate models of regressing the measures of response biases (OCI=IM) on the respective self- and other-ratings of the HEXACO dimensions as well as random Ãintercepts and variances. Standard errors are shown in parentheses. p < .05. † p < .10. false alarm rate was .22 (SD ¼ .15). Signal detection theory response biases, they are expected to be positively associated (Macmillan & Creelman, 2005), based on a loglinear correc- with differences between self- and other-ratings such that a tion of the hit and false alarm rates (Stanislaw & Todorov, large difference occurs for individuals with a strong ten- 1999), was used to determine the index of discrimination dency to provide self-enhanced responses (and vice versa). accuracy, d’ ¼ z(H) – z(FA), and the location of the criter- To test this assumption, we employed the analysis strategy à ion, c ¼À.5 [z(H) þ z(FA)]. Because c expresses the ten- recently suggested by Humberg et al. (2018). Specifically, we dency to respond “No”, expressed the extent of estimated (hierarchical) linear regression models predicting à overclaiming as OCI ¼ (À1) c, so that higher values of IM or OCI, respectively, by self- and other-ratings of per- the overclaiming index (OCI) correspond to stronger over- sonality separately for each HEXACO factor. A positive claiming. The OCI ranged from À1.24 to 1.04 (M ¼À0.18, regression coefficient of the self-rating in conjunction with a SD ¼ 0.40), indicating sufficient variance in overclaiming. negative coefficient of the other-ratings suggests that self- Participants quite accurately discriminated between targets favoring response distortions are associated with higher self- ¼ ¼ ¼ – and lures (Md’ 1.38, SD 0.46, range 0.47 2.4). enhancement (i.e., IM or OCI) scores and, thus, points to Accordingly, OCI and d’ were negatively correlated (r ¼ the validity of IM and OCI in terms of measuring self- À.45, p < .001). Similar to overclaiming, the IM scale also enhancement. To test this difference between these regres- exhibited sufficient variation (M ¼ 2.8, SD ¼ 0.55, range ¼ sion coefficients directly, we defined the auxiliary parameter ¼j – jÀj þ j 1.6–4.2). Overall, descriptive results concerning IM and hit abs bself-rating bother-rating bself-rating bother-rating and false alarm rates (and thus, d’ and OCI) were similar to (Humberg et al., 2018). A positive abs parameter indicates related research (Dunlop et al., 2017; Musch et al., 2012; that self-other discrepancies are associated with individual Paulhus et al., 2003; Zettler et al., 2015). differences in IM or OCI scores. In particular, a significantly – Unexpectedly, IM and OCI were negatively correlated (r positive abs parameter in conjunction with (bself-rating ¼À ¼ .17, p .027), so that higher IM was associated with bother-rating) being positive indicates that higher self-other less overclaiming. The OCI showed rather small correlations discrepancies are associated with higher IM or OCI scores, to the personality dimensions, with the only significant cor- whereas a positive abs parameter in conjunction with (bself- – relations concerning self-rated and other-rated emotionality rating bother-rating) being negative indicates a negative associ- (r ¼À.19, p ¼ .017 and r ¼À.17, p ¼ .029) and self-rated ation between IM or OCI scores and self-other discrepan- openness (r ¼ .16, p ¼ .039). Similar to previous findings cies. If abs is negative, self-other discrepancies are not (e.g., Paulhus, 2002), IM was positively correlated with self- systematically related to IM or OCI scores. Addressing the rated emotionality (r ¼ .20, p ¼ .012) and self- and other- dyadic structure in our data, we computed mixed regression rated agreeableness (r ¼ .24, p ¼ .002 and r ¼ .17, p ¼ models using Mplus 7 (Muthen & Muthen, 1998–2012). A .029). Moreover, IM exhibited positive correlations with random intercept was included to allow for variation both self-rated (r ¼ .49, p < .001) and other-rated (r ¼ .36, between dyads. ICCs indicated that a multilevel approach p < .001) honesty-humility. was required (see Table 2).

Statistical analyses Testing for self-favoring response biases Given that individuals exhibiting self-favoring response The regression results are summarized in Table 2. Neither biases inflate their self-ratings to an overly favorable level, IM nor the OCI were associated with larger discrepancies other-ratings should offer a more realistic view of these between self- and other-ratings as there was no significantly individuals. If IM or the OCT are valid measures of positive abs parameter. Thus, none of the purported RESPONSE BIAS IN SELF-RATINGS OF PERSONALITY 233 measures of self-favoring response biases could account for ratings of honesty-humility should be unrelated (or even such. Apart from that, there was a significantly positive negatively related) to self-reported IM. Thus, the results main effect of self-rated openness on overclaiming. rather suggest that individuals high in IM actually possessed Concerning IM, self- and other-rated honesty-humility coef- high trait levels of honesty-humility (de Vries et al., 2014; ficients were both significantly positive. Zettler et al., 2015). Because the dyads were composed of individuals having Even though overclaiming occurred in our sample, the quite different kinds of relationships, relationship character- OCI failed to account for self-other discrepancies for any istics might have had a confounding influence on target rat- considered personality dimension. Thus, the results are in ings. To control for this, we also ran all the regression line with research suggesting that overclaiming does not models using the various relationship variables (i.e., relation- seem to be an adequate measure of self-favoring response ship length, relationship type, contact frequency, contact bias (Dunlop et al., 2017; Kam et al., 2015). Instead, over- type, and perceived closeness) as potential moderator varia- claiming was positively associated with self-rated openness. bles. However, none of the relationship variables acted as This rather fits the cognitive exploration account of over- a moderator. claiming (Dunlop et al., 2017), which suggests that over- claiming might reflect a dispositional tendency for cognitive and aesthetic exploration. Consistent with this account, Testing for suppression effects Dunlop et al. (2017) found that the residuals of self-rated Following related research (Borkenau & Zaltauskas, 2009; openness controlled for the respective other-ratings were Piedmont et al., 2000), we also tested if IM or OCI act as correlated with overclaiming. Correspondingly, Atir, suppressors; that is, whether the correlation between self- Rosenzweig, and Dunning (2015) observed a link between and other-ratings increases when controlling for IM or OCI. self-perceived knowledge in a domain and overclaiming in To this end, we compared the slopes regressing self- on the same domain. other-ratings with the slopes regressing the self-ratings resi- In tandem, the results suggest that neither IM nor OCI dualized for IM and OCI, respectively, on the other-ratings. can be seen to measure self-serving biases. Moreover, the The regression coefficients using the residualized self-ratings rather surprising negative association between IM and OCI were virtually identical or even significantly smaller (see indicates that these measures could even reflect psycho- Appendix A), thereby disconfirming the presence of sup- logical variables that work in opposite directions. Whereas pressor effects for any of the HEXACO factors. We also some previous studies found no relationship between IM compared the slopes regressing other-ratings on self-ratings and OCI (Dunlop et al., 2017; Mesmer-Magnus et al., 2006), with the slopes regressing other-ratings on the residualized we are not aware of any study showing negative associa- self-ratings. Likewise, there were no suppressor effects (see tions. Frankly, the reasons are unclear. Although IM might Appendix B for details). be regarded as reflecting honesty, the OCI itself was unre- lated to this factor and failed to account for self-enhance- ment. Thus, this pattern of associations should be Discussion reexamined in further research. IM scales and the OCT share the purpose of measuring self- This study is subject to some limitations. Most important, favoring response biases. Given the present criticism of such the approach used in this study to evaluate measures of measures (Kam et al., 2015; Uziel, 2010; Zettler et al., 2015), response biases rests on the assumption that differences we argued that a valid measure of self-favoring response between self- and other-reports are at least in part due to bias should account for discrepancies between self- and self-favoring response biases (e.g., Borkenau & Ostendorf, other-ratings of personality. To test this assumption, we let 1992; Piedmont et al., 2000). Clearly, differences also occur well-acquainted dyads rate both their own and their dyad for other reasons, most important as result of different partner’s personality. It was expected that measures of informational foundations (e.g., Connelly & Ones, 2010; response distortions are associated with discrepancies Funder, 1995; McAbee & Connelly, 2016; Vazire, 2010). between self- and other-ratings of personality, as they should Further, individuals who show strong self-presentation ten- account for inflated self-ratings in particular for those per- dencies in an assessment might regularly present themselves sonality dimensions that are prone to socially desir- in a more positive light toward their dyad partners (Nezlek, able responding. Schutz,€ & Sellin, 2007), so that other-ratings are inflated to However, we found neither IM nor OCI to account for a similar extent as the corresponding self-rating (albeit to self-other discrepancies and thus self-enhancement effects the best of others’ knowledge and beliefs). Indeed, it has not being associated with IM or OCI, respectively. In add- been suggested that overlap between self- and other-ratings ition, other parts of the results further question whether IM is, in fact, a result of successful impression management scales can be interpreted as a measure of response biases. In influencing the other-ratings especially for traits with lower particular, IM exhibited substantial associations with both visibility such as neuroticism or agreeableness (Connelly & self- and other-ratings of honesty-humility. If the positive Ones, 2010; Danay & Ziegler, 2011). It should further be association between IM and self-reported honesty-humility noted that other-ratings can also be affected by response is interpreted as reflecting that individuals merely claim to biases. For example, other-ratings of personality are influ- possess this positively valued attribute, the respective other- enced by the extent of liking for the target (Leising, Erbs, & 234 MULLER€ AND MOSHAGEN

Fritz, 2010; Leising, Ostrovski, & Zimmermann, 2013), so level bifactor fodel on the HEXACO Personality Inventory. that the rather close relationships between the dyad partners European Journal of Personality, 31, 669–684. doi:10.1002=per.2120 – in this study could have led to more socially desirable rat- Ashton, M. C., & Lee, K. (2009). The HEXACO 60: A short measure of the major dimensions of personality. Journal of Personality ings as well, even though none of the relationship variables Assessment, 91, 340–345. doi:10.1080=00223890902935878 acted as moderators. Moreover, we deliberately chose not to Atir, S., Rosenzweig, E., & Dunning, D. (2015). When knowledge rely on a high-demand situation (e.g., an applicant setting) knows no bounds: Self-perceived expertise predicts claims of impos- – to elicit socially desirable responding. Paulhus (2011) argued sible knowledge. Psychological Science, 26, 1295 1303. doi: 10.1177=0956797615588195 that overclaiming reflects response biases when an audience Barber, L. K., Barnes, C. M., & Carlson, K. D. (2013). Random and sys- is salient. Although participants were aware of the fact that tematic error effects of insomnia on survey behavior. Organizational they would also be rated by their dyad partner, so that an Research Methods, 16,616–649. doi:10.1177=1094428113493120 audience might have been implicitly present in the form of Bensch, D., Paulhus, D. L., Stankov, L., & Ziegler, M. (in press). their partner, this situation differs from a public assessment. Teasing apart overclaiming, overconfidence, and socially desirable responding. Assessment. Advance online publication. doi: Nevertheless, there was sufficient variation in both IM and 10.1177=1073191117700268 overclaiming in our sample, so that some participants evi- Bing, M. N., Kluemper, D., Davison, H. K., Taylor, S., & Novicevic, M. dently did exhibit the two behaviors. Finally, our overclaim- (2011). Overclaiming as a measure of faking. Organizational ing measure relied on dichotomous knowledge ratings of Behavior and Human Decision Processes, 116, 148–162. doi: = targets and lures that differed from the Likert-type familiar- 10.1016 j.obhdp.2011.05.006 Borkenau, P., & Ostendorf, F. (1992). Social desirability scales as mod- ity ratings (which are later dichotomized) as employed, for erator and suppressor variables. European Journal of Personality, 6, example, by Paulhus et al. (2003). It might be argued that 199–214. doi:10.1002=per.2410060303 different scoring procedures elicit different answering proc- Borkenau, P., & Zaltauskas, K. (2009). Effects of self-enhancement on esses. However, our item set was based on items that were agreement on personality profiles. European Journal of Personality, – = previously either used with continuous familiarity ratings 23, 107 123. doi:10.1002 per.707 Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The conver- (Dunlop et al., 2017) or dichotomous knowledge ratings gent validity between self and observer ratings of personality: A (Musch et al., 2012). Considering that we observed a mean meta-analytic review. International Journal of Selection and false alarm rate very similar to the ones reported by Paulhus Assessment, 15, 110–117. doi:10.1111=j.1468-2389.2007.00371.x et al. (2003) or Dunlop et al. (2017), the scoring procedure Connelly, B. S., & Ones, D. S. (2010). An other perspective on personal- ’ does not seem to have a substantial influence on individuals’ ity: Meta-analytic integration of observers accuracy and predictive validity. Psychological Bulletin, 136,1092–1122. doi:10.1037=a0021212 decision processes. In line with this reasoning, Atir, Danay, E., & Ziegler, M. (2011). Is there really a single factor of person- Rosenzweig, and Dunning (2015) directly compared using ality? A multirater approach to the apex of personality. Journal of knowledge ratings and familiarity ratings and found no dif- Research in Personality, 45,560–567. doi:10.1016=j.jrp.2011.07.003 ference between the approaches. de Vries, R. E., Zettler, I., & Hilbig, B. E. (2014). Rethinking trait con- ceptions of social desirability scales: Impression management as an expression of honesty-humility. Assessment, 21, 286–299. doi: Conclusion 10.1177=1073191113504619 Dodou, D., & de Winter, J. (2014). Social desirability is the same in In conclusion, this study indicates that neither IM nor over- offline, online, and paper surveys: A meta-analysis. Computers in – = claiming should be used as measures of self-favoring Human Behavior, 36, 487 495. doi:10.1016 j.chb.2014.04.005 Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., response distortions, given that both failed to conclusively & Ludeke, S. G. (2017). Openness to (reporting) experiences that account for differences between self- and other-ratings of one never had: Overclaiming as an outcome of the knowledge accu- personality as an indicator of self-enhancement. In contrast, mulated through a proclivity for cognitive and aesthetic exploration. the results are aligned with a cognitive exploration account Journal of Personality and Social Psychology, 113, 810–834. doi: 10.1037=pspp0000110 of overclaiming and with the ideas of considering IM as à Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G Power 3: reflecting interpersonally oriented self-control or honesty A flexible statistical power analysis program for the social, behav- rather than a self-serving response bias. ioral, and biomedical sciences. Behavior Research Methods, 39, 175–191. doi:10.3758=BF03193146 Funder, D. C. (1995). On the accuracy of personality judgment: A real- Acknowledgments istic approach. Psychological Review, 102, 652–670. doi: 10.1037=0033-295X.102.4.652 The data are available in the Open Science Framework at https://osf. Goffin, R. D., & Boyd, A. C. (2009). Faking and personality assessment io=mbje5=. This research was not preregistered. in personnel selection: Advancing models of faking. Canadian Psychology=Psychologie Canadienne, 50, 151–160. doi: 10.1037=a0015946 Funding Hough, L. M. (1998). Effects of intentional distortion in personality This work was supported by the Deutsche Forschungsgemeinschaft measurement and evaluation of suggested palliatives. Human – = (MO 2830=1–1). Performance, 11, 209 244. doi:10.1080 08959285.1998.9668032 Humberg, S., Dufner, M., Schonbrodt,€ F. D., Geukes, K., Hutteman, R., van Zalk, M. H. W., … Back, M. D. (2018). Enhanced versus sim- References ply positive: A new condition-based regression analysis to disentan- gle effects of self-enhancement from effects of positivity of self-view. Anglim, J., Morse, G., de Vries, R. E., MacCann, C., & Marty, A. Journal of Personality and Social Psychology 114, 303-322. doi: (2017). Comparing job applicants to non-applicants using an item- 10.1037=pspp0000134 RESPONSE BIAS IN SELF-RATINGS OF PERSONALITY 235

Kam, C., Risavy, S. D., & Perunovic, W. E. (2015). Using over-claiming Nezlek, J. B., Schutz,€ A., & Sellin, I. (2007). Self-presentational success technique to probe social desirability ratings of personality items: A in daily social interaction. Self and Identity, 6, 361–379. doi: validity examination. Personality and Individual Differences, 74, 10.1080=15298860600979997 177–181. doi:10.1016=j.paid.2014.10.017 Oh, I.-S., Wang, G., & Mount, M. K. (2011). Validity of observer rat- Kandler, C. (2012). Knowing your personality is knowing its nature: ings of the five-factor model of personality traits: A meta-analysis. The role of information accuracy of peer assessments for heritability Journal of Applied Psychology, 96, 762–773. doi:10.1037=a0021832 estimates of temperamental and personality traits. Personality and Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social Individual Differences, 53, 387–392. doi:10.1016=j.paid.2012.01.004 desirability in personality testing for personnel selection: The red Kemper, C. J., & Menold, N. (2014). Nuisance or remedy?: The utility herring. Journal of Applied Psychology, 81, 660–679. doi: of stylistic responding as an indicator of data fabrication in surveys. 10.1037=0021-9010.81.6.660 Methodology, 10(3), 92–99. doi:10.1027=1614-2241=a000078 Paulhus, D. L. (1991). Measurement and control of response bias. In Kolar, D. W., Funder, D. C., & Colvin, C. R. (1996). Comparing the J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Measures of accuracy of personality judgements by the self and knowledgeable social psychological attitudes: Measures of personality and social psy- others. Journal of Personality, 64, 311–337. doi:10.1111=j.1467- chological attitudes (pp. 17–59). San Diego, CA: Academic Press. 6494.1996.tb00513.x Paulhus, D. L. (2002). Socially desirable responding: The evolution of a Konstabel, K., Aavik, T., & Allik, J. (2006). Social desirability and con- construct. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The sensual validity of personality traits. European Journal of Personality, role of constructs in psychological and educational measurement (pp. 20, 549–566. doi:10.1002=per.593 49–69). Mahwah, NJ: Erlbaum. Lee, K., & Ashton, M. C. (2004). Psychometric properties of the Paulhus, D. L. (2011). Overclaiming on personality questionnaires. In HEXACO Personality Inventory. Multivariate Behavioral Research, M. Ziegler, C. MacCann, & R. D. Roberts (Eds.), New perspectives 39, 329–358. doi:10.1207=s15327906mbr3902_8 on faking in personality assessment (pp. 151–164). New York, NY: Leising, D., Erbs, J., & Fritz, U. (2010). The letter of recommendation Oxford University Press. effect in informant ratings of personality. Journal of Personality and Paulhus, D. L., & Harms, P. D. (2004). Measuring cognitive ability Social Psychology, 98, 668–682. doi:10.1037=a0018771 with the overclaiming technique. Intelligence, 32, 297–314. doi: Leising, D., Ostrovski, O., & Zimmermann, J. (2013). “Are we talking 10.1016=j.intell.2004.02.001 about the same person here?” Interrater agreement in judgments of Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The personality varies dramatically with how much the perceivers like over-claiming technique: Measuring self-enhancement independent the targets. Social Psychological and Personality Science, 4, 468–474. of ability. Journal of Personality and Social Psychology, 84, 890–904. doi:10.1177=1948550612462414 doi:10.1037=0022-3514.84.4.890 Ludeke, S. G., & Makransky, G. (2016). Does the over-claiming ques- Paulhus, D. L., & Holden, R. R. (2010). Measuring self-enhancement: tionnaire measure overclaiming? Absent convergent validity in a From self-report to concrete behavior. In C. R. Agnew (Ed.), Then a large community sample. Psychological Assessment, 28, 765–774. doi: miracle occurs: Focusing on behavior in social psychological theory 10.1037=pas0000211 and research. Purdue Symposium on Psychological Sciences (pp. Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s 227–246). Oxford, UK: Oxford University Press. guide (2nd ed.). Mahwah, NJ: Erlbaum. Phillips, D. L., & Clancy, K. J. (1972). Some effects of “social McAbee, S. T., & Connelly, B. S. (2016). A multi-rater framework for desirability” in survey studies. American Journal of Sociology, 77, studying personality: The trait-reputation-identity model. 921–940. doi:10.1086=225231 Psychological Review, 123, 569–591. doi:10.1037=rev0000035 Piedmont, R. L., McCrae, R. R., Riemann, R., & Angleitner, A. (2000). McCrae, R. R., & Weiss, A. (2007). Observer ratings of personality. In On the invalidity of validity scales: Evidence from self-reports and R. W. Robins, R. C. Fraley, & R. Krueger (Eds.), Handbook of observer ratings in volunteer samples. Journal of Personality and research methods in personality psychology (pp. 259–272). New York, Social Psychology, 78, 582–593. doi:10.1037=0022-3514.78.3.582 NY: Guilford. Randall, D. M., & Fernandes, M. F. (1991). The social desirability Mesmer-Magnus, J., Viswesvaran, C., Deshpande, S., & Joseph, J. response bias in ethics research. Journal of Business Ethics, 10, (2006). Social desirability: The role of over-claiming, self-esteem, 805–817. doi:10.1007=BF00383696 and emotional intelligence. Psychology Science, 48, 336–356. Reips, U.-D. (2002). Standards for Internet-based experimenting. Miller, J. D., Pilkonis, P. A., & Clifton, A. (2005). Self- and other- Experimental Psychology, 49, 243–256. doi:10.1026=1618-3169.49.4.243 reports of traits from the five-factor model: Relations to personality Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection disorder. Journal of Personality Disorders, 19, 400–419. doi: theory measures. Behavior Research Methods, Instruments, & 10.1521=pedi.2005.19.4.400 Computers, 31, 137–149. doi:10.3758=BF03207704  Moshagen, M., Hilbig, B. E., & Zettler, I. (2014). Faktorenstruktur, psy- Tonkovic, M., Galic, Z., & Jerneic, Z. (2011). The construct validity of chometrische Eigenschaften und Messinvarianz der deutschsprachigen over-claiming as a measure of egoistic enhancement. Review of Version des 60-Item HEXACO Personlichkeitsinventars€ [Factor struc- Psychology, 18(1), 13–21. ture, psychometric properties, and measurement invariance of the Tracy, J. L., Cheng, J. T., Robins, R. W., & Trzesniewski, K. H. (2009). German-language version of the 60-item HEXACO personality inven- Authentic and hubristic pride: The affective core of self-esteem and nar- tory]. Diagnostica, 60(2), 86–97. doi:10.1026=0012-1924=a000112 cissism. Self and Identity, 8, 196–213. doi:10.1080=15298860802505053 Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of obser- Uziel, L. (2010). Rethinking social desirability scales: From impression ver ratings of the Big Five personality factors. Journal of Applied management to interpersonally oriented self-control. Perspectives on Psychology, 79, 272–280. doi:10.1037=0021-9010.79.2.272 Psychological Science, 5, 243–262. doi:10.1177=1745691610369465 Musch, J., Brockhaus, R., & Broder,€ A. (2002). Ein Inventar zur Vazire, S. (2010). Who knows what about a person? The self-other Erfassung von zwei Faktoren sozialer Erwunschtheit€ [An inventory knowledge asymmetry (SOKA) model. Journal of Personality and assessing two factors of socially desirable responding]. Diagnostica, Social Psychology, 98, 281–300. doi:10.1037=a0017908 48(3), 121–129. doi:10.1026=0012-1924.48.3.121 Vazire, S., & Carlson, E. N. (2011). Others sometimes know us better Musch, J., Ostapczuk, M., & Klaiber, Y. (2012). Validating an inventory than we know ourselves. Current Directions in Psychological Science, for the assessment of egoistic bias and moralistic bias as two separ- 20, 104–108. doi:10.1177=0963721411402478 able components of social desirability. Journal of Personality Zettler, I., Hilbig, B. E., Moshagen, M., & de Vries, R. E. (2015). Assessment, 94, 620–629. doi:10.1080=00223891.2012.672505 Dishonest responding or true virtue?: A behavioral test of impres- Muthen, L. K., & Muthen, B. O. (1998–2012). Mplus user’s guide (7th sion management. Personality and Individual Differences, 81, ed.). Los Angeles, CA: Muthen & Muthen. 107–111. doi:10.1016=j.paid.2014.10.007 236 MULLER€ AND MOSHAGEN

Appendix A. Suppression analysis: Fixed effects of linear mixed models

Fixed effect Hself Eself Xself Aself Cself Oself Uncontrolled self-ratings Other rating .46 (.07) .55 (.07) .59 (.06) .54 (.07) .50 (.07) .55 (.07) Self-ratings controlled for OCI Ã Other rating .41 (.07) .53 (.07) .55 (.07) .53 (.07) .52 (.07) .42 (.07) Self-ratings controlled for IM Ã Ã Other rating .30 (.08) .54 (.07) .55 (.07) .51 (.07) .52 (.07) .43 (.07)

Note. N ¼ 162. Hself ¼ self-rated honesty-humility; Eself ¼ self-rated emotionality; Xself ¼ self-rated extraversion; Aself ¼ self-rated agreeableness; Cself ¼ self-rated conscientiousness; Oself ¼ self-rated openness; OCI ¼ overclaiming index; IM ¼ impression management. Uncontrolled self-ratings show the (fixed) slopes for the models regressing the self-rated HEXACO personality factors on the respective other-rated personality factors. Controlled self-ratings show the (fixed) slopes for the models regressing the resulting residualized self-ratings (controlling for OCI and IM, respectively) on the other-ratings. All coefficients are standardized, Ãstandard errors in parentheses. Estimates differ significantly from the respective uncontrolled estimate at p < .05.

Appendix B. Suppression analyses predicting other-ratings by self-report data

Fixed effect Hother Eother Xother Aother Cother Oother Uncontrolled Self-ratings Self-rating .46 (.07) .55 (.07) .59 (.06) .53 (.07) .52 (.07) .55 (.07) Self-ratings controlled for OCI Ã Self-rating .41 (.07) .53 (.07) .55 (.07) .52 (.07) .52 (.07) .42 (.07) Self-ratings controlled for IM Ã Ã Self-rating .30 (.08) .54 (.07) .55 (.07) .50 (.07) .52 (.07) .43 (.07)

Note. N ¼ 162. Hother ¼ other-rated honesty-humility; Eother ¼ other-rated emotionality; Xother ¼ other-rated extraversion; Aother ¼ other-rated agreeableness; Cother ¼ other-rated conscientiousness; Oother ¼ other-rated openness; OCI ¼ overclaiming index; IM ¼ impression management. Uncontrolled self-ratings show the (fixed) slopes for the linear mixed models regressing the other-rated HEXACO personality factors on the respective self-rated personality factors. Controlled self-ratings show the (fixed) slopes for the linear mixed models regressing the other-ratings on the resulting residualized self-ratings (controlling for ÃOCI and IM, respectively). All coefficients are standardized, standard errors in parentheses. Estimates differ significantly from the respective uncontrolled estimate at p < .05. Running head: TRUE VIRTUE, SELF-PRESENTATION, OR BOTH?

Study 2:

Müller, S., & Moshagen, M. (2019). True virtue, self-presentation, or both?: A behavioral test

of impression management and overclaiming. Psychological Assessment, 31, 181–191.

https://doi.org/10.1037/pas0000657

© 2018, American Psychological Association. This paper is not the copy of record and may not exactly replicate the final, authoritative version of the article published in

Psychological Assessment. Please do not copy or cite without authors’ permission. The final article is available at https:/doi.org/10.1037/pas0000657

True Virtue, Self-Presentation, or Both?:

A Behavioral Test of Impression Management and Overclaiming

Sascha Müller and Morten Moshagen

Ulm University, Germany

Correspondence concerning this article should be addressed to Sascha Müller, Institute of Psychology and Education, Ulm University, Albert-Einstein-Allee 47, 89081 Ulm,

Germany, E-Mail: [email protected].

Preparation of this manuscript was supported by Grant MO 2830/1-1 from the German

Research Foundation (DFG) to the second author. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 2

Abstract

Measures of self-favoring response biases such as impression management (IM) scales or the overclaiming technique (OCT) have been developed to detect distorted self-reports. However, the validity of these approaches has been questioned. In the present study (N = 461), we further examined both IM (assessed by the respective subscale of the BIDR) and the OCT as measures of self-favoring response biases and their associations with honesty-humility.

Specifically, we tested the two competing accounts of IM as a measure of true virtues or a measure of overly positive self-presentation against each other. Using two behavioral paradigms, we corroborated recent findings that higher IM scores are associated with higher trait honesty as well as with more honest behavior. At the same time, individuals with high

IM scores also presented themselves overly favorably, as IM accounted for discrepancies between hypothetical and actual (incentivized) behavior in a dictator game. Thereby, the present results provide direct evidence for the notion that the IM subscale of the BIDR confounds trait honesty with response bias and thus measures both. In contrast, overclaiming was unrelated to trait honesty, all observed behaviors, and self-presentation, indicating that the OCT is not a valid measure of self-favoring response bias.

Keywords: impression management, overclaiming, self-presentation, response bias, social desirability

Public Significance Statement

The present study suggests that neither impression management nor the overclaiming technique are optimal solutions to assess overly favorable self-reports of psychological characteristics. Whereas the potential threat of invalid responding was unaccounted for by overclaiming, impression management not only captured self-presentation but also true virtues.

TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 3

True Virtue, Self-Presentation, or Both?:

A Behavioral Test of Impression Management and Overclaiming

Since the early days of psychological assessment, researchers acknowledge the circumstance that some individuals might distort self-reports to present themselves overly favorably (e.g., Allport, 1928; Bernreuter, 1933; Rosenzweig, 1934). Possible implications of some respondents distorting their ratings in an assessment process are diverse. For example, job applicants boasting their qualities and concealing their problems gives them an advantage over applicants responding honestly. Especially in clinical or forensic contexts, consequences of over- or underreporting can be severe, ranging from inpatients trying to be released, over unnecessary hospitalizations costing time and resources, to abusive parents attempting to be granted child custody under false pretenses. This potential problem of overly positive self- reports, often referred to as socially desirable responding or self-favoring response bias, has prompted the development of several approaches intended to detect such response distortions.

As distorting one’s ratings can be regarded as a form of dishonest behavior, using measures of self-favoring response biases or “lie scales” is a popular approach to assess this threat to the validity of an assessment. Such measures are based on the idea that individuals claiming socially desirable (but rare) or denying socially undesirable (but common) attributes generally present themselves overly favorably (e.g., Jackson & Messick, 1958), so that assessing this tendency via a specific scale provides the opportunity to detect the presence of response distortions. Following pioneering works such as the Edwards Social Desirability

Scale (Edwards, 1957), the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe,

1960), or the MMPI Lie Scale (Hathaway & McKinley, 1951), the Balanced Inventory of

Desirable Responding (BIDR; Paulhus, 1991) emerged as the quasi-standard tool in nonclinical contexts with its impression management (IM) scale designed to assess a response style characterized by overly positive self-presentation that is directed towards an audience. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 4

Therefore, individuals high in IM are thought to distort their self-descriptions in a favorable manner as a means to portray themselves in line with social norms (Paulhus, 1984).

The Validity of Validity Measures

Given that measures of self-favoring response biases are intended to identify individuals engaging in distorted responding, the validity of the measures themselves needs to be demonstrated. However, skepticism concerning the utility of such measures has been raised (e.g., McCrae & Costa, 1983; Ones, Viswesvaran, & Reiss, 1996). Beyond a variety of failed attempts to demonstrate the ability of IM scales to account for self-favoring response bias on self-reports of personality (e.g., Borkenau & Ostendorf, 1992; Borkenau &

Zaltauskas, 2009; Konstabel, Aavik, & Allik, 2006; Piedmont, McCrae, Riemann, &

Angleitner, 2000), IM also exhibits a substantial overlap with self-reported personality traits

(Ones et al., 1996; Paulhus, 2002). If IM scores represent substantive individual differences in personality traits, they confound what is commonly termed “response style” (e.g., the tendency to provide overly positive self-descriptions) with “trait substance” (e.g., the extent of factually being honest, humble, agreeable, etc.).1 This, in turn, gives rise to the issue whether individuals truly possess virtues or whether they merely provide overly favorable self-reports.

As a consequence of this ambiguity, interpretations of high IM scores other than response bias have been suggested. Concerning a true-virtue interpretation of IM, Uziel

(2010) argued that individuals high in IM primarily demonstrate high levels of self-control in social contexts so that IM should be utilized to measure interpersonally oriented self-control.

Testing this notion, Uziel (2014) found self-control to be a major feature of IM. Furthermore,

1 Of note, a stable tendency to misrepresent one’s characteristics of course also reflects trait variance and thus “substance”. Therefore, the common terminology of “substance vs. style” could be regarded a misnomer. However, due to the common understanding of what is meant by these terms, we nonetheless stick to this wording. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 5

IM was associated with larger discrepancies between the actual and the ideal self, and there was high self-other agreement on IM between romantic partners. A related account is to interpret IM as a measure of trait honesty (de Vries, Zettler, & Hilbig, 2014; Zettler, Hilbig,

Moshagen, & de Vries, 2015), with high scores primarily reflecting a high degree of trait honesty. This interpretation received empirical support by highly positive correlations with the honesty-humility (HH) factor of the HEXACO model of personality averaging around r =

.50 (Dunlop et al., 2017; Zettler et al., 2015). The HEXACO model of personality, derived from lexical studies in several languages (Lee & Ashton, 2008), contains the basic personality dimensions of conscientiousness, emotional stability, extraversion, and openness akin to the

Five Factor Model and a rotated version of agreeableness. Most importantly, HH is considered the sixth broad factor of personality, comprising the facets sincerity, fairness, greed-avoidance, and modesty. As such, HH is focused on capturing peripheral elements of

Big Five agreeableness (Ashton & Lee, 2001) and signals honest, cooperative behavior and a modest, unassuming nature. Accordingly, HH has been shown to be the strongest predictor of fair and altruistic behavior among the broad personality dimensions (Hilbig, Thielmann,

Wührl, & Zettler, 2015), the only consistent (negative) predictor of cheating in behavioral paradigms (Heck, Thielmann, Moshagen, & Hilbig, 2018), and to predict ethical decision- making (Ashton & Lee, 2008).

The trait-honesty account of IM seems plausible considering that the content of the IM scale rather closely resembles the items measuring the HH dimension (Ashton & Lee, 2009;

Lee & Ashton, 2004). On the other hand, it has been discussed if HH itself also carries some degree of self-favoring response bias (Ashton & Lee, 2010; Ashton, Lee, & de Vries, 2014;

Hilbig, Moshagen, & Zettler, 2015). Thus, a mere correlation between IM and trait honesty should not be interpreted as sufficient evidence of a trait-honesty account concerning IM.

Crucially, the assumption that IM primarily carries trait information of honesty implies that individuals high in IM should also behave honestly (rather than just claim doing so). In line TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 6 with this notion, Zettler et al. (2015) examined the behavior associated with IM in an incentivized cheating paradigm. Corroborating the trait-honesty account, high values of IM were associated with less cheating in a coin toss task, indicating that individuals with high IM scores factually behave more honestly.

Whereas these findings support the theoretical idea that IM primarily reflects trait honesty, associations with both trait honesty and honest behavior do not rule out that IM may at the same time reflect self-favoring response biases, that is, high IM scores may reflect both a tendency to misrepresent oneself as well as the tendency to be truly honest. Specifically, high IM scores might indicate honest individuals who exhibit little response bias, dishonest individuals who exhibit a strong response bias, or moderately honest individuals who exhibit a moderate response bias. Indeed, there is some initial evidence that IM could reflect a mélange of trait variance and response bias (Connelly & Chang, 2016; Lönnqvist, Paunonen,

Tuulio-Henriksson, Lönnqvist, & Verkasalo, 2007). However, so far no study has investigated the relations of IM to both dishonest behavior and the extent of response bias simultaneously.

Overclaiming as an Alternative Validity Measure

In light of the findings casting doubt on the interpretation of IM scales as primarily reflecting response distortions, Paulhus (2011) advised against using IM scales in this regard and introduced the overclaiming technique (OCT; Paulhus, Harms, Bruce, & Lysy, 2003) as an alternative measure of self-presentation tendencies in self-reports. As opposed to lie scales, the OCT is a behavioral method instead of a mere questionnaire (Paulhus & Holden, 2010) with the intention to measure discrepancies of self-reports from a defined criterion value in a socially valued domain. Overclaiming is defined as the tendency to claim familiarity with factually nonexistent terms and, thus, to overstate one’s actual knowledge (Phillips & Clancy,

1972). However, findings concerning the validity of the OCT as a measure of self-favoring response bias have been inconsistent as well. Whereas some studies reported positive associations with indicators of self-enhancement such as IM and narcissism (e.g., Bensch, TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 7

Paulhus, Stankov, & Ziegler, in press; Paulhus et al., 2003; Randall & Fernandes, 1991;

Tracy, Cheng, Robins, & Trzesniewski, 2009), others found no or only small associations

(e.g., Barber, Barnes, & Carlson, 2013; Dunlop et al., 2017; Grosz, Lösch, & Back, 2017).

Notably, the OCT is substantially related to openness (Dunlop et al., 2017; Tonković, Galić,

& Jerneić, 2011) which led Dunlop et al. (2017) to conclude that overclaiming is to be seen as the result of a dispositional tendency to be curious and explorative (rather than reflecting response bias). Similarly, overclaiming appears to share processes underlying the hindsight bias (Müller & Moshagen, 2018). More direct validation approaches of the OCT as a measure of self-favoring response bias resulted in a similarly inconsistent picture with some studies in favor (Bing, Kluemper, Davison, Taylor, & Novicevic, 2011; Kemper & Menold, 2014) and some to the disadvantage of the OCT (Kam, Risavy, & Perunovic, 2015; Ludeke &

Makransky, 2016). Moreover, overclaiming is related to indicators of careless responding

(Barber et al., 2013; Ludeke & Makransky, 2016) and a nomological network with other measures of self-favoring response biases could not be confirmed (Bensch et al., in press).

The Present Study

In light of the discrepant findings concerning both IM and the OCT, the present study investigated the interplay of IM, OCT, a trait measure of honesty, and both self-reported and actual (im)moral behavior, thereby allowing to clarify the extent to which IM and the OCT can be used as measures of response distortions and, concerning IM in particular, whether IM primarily expresses style, substance, or both. Specifically, the first aim of the present study was to replicate and extend previous findings supporting a trait-honesty account of IM (de

Vries et al., 2014; Dunlop et al., 2017; Zettler et al., 2015) by relying on a fully consequential and incentive-compatible behavioral task. As an immediate measure of dishonest behavior, we used the same behavioral cheating paradigm employed by Zettler et al. (2015). To the extent that IM scores reflect true virtues, it was expected that IM exhibits a negative relation to dishonest behavior in the cheating task. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 8

The second aim of the present study was to examine a response-bias account of IM by considering the discrepancy between self-report and a behavioral criterion measure. To this end, we employed a classic paradigm of experimental economics and behavioral ethics research, namely the dictator game (DG; Forsythe, Horowitz, Savin, & Sefton, 1994) as a variation of the ultimatum game (e.g., Güth, Schmittberger, & Schwarze, 1982; Kahneman,

Knetsch, & Thaler, 1986). In this paradigm, individuals are asked to divide an amount of money at will between themselves and another player. The retained amount can be interpreted as a measure of selfish versus fair and altruistic behavior (Hilbig, Thielmann, et al., 2015). To obtain a discrepancy measure indicating self-presentation, participants also played a hypothetical version of the DG in addition to the actual incentivized DG described above

(while realizing a substantial period of time between the assessments of the hypothetical and the actual DG). Thereby, participants provided a self-reported estimate of their behavior (in the hypothetical DG) and exhibited actual behavior (in the incentivized DG). Thus, if IM primarily reflects variance related to response styles, higher IM scores should be associated with larger discrepancies between actual and hypothetical behavior in the DG. In contrast, if

IM primarily reflects honesty, high scorers should exhibit little discrepancies between actual and hypothetical DG behaviors.

Finally, as the overclaiming technique is intended as a form of criterion-discrepancy measure designed to assess a general tendency of motivated self-enhancement (Paulhus, 2011;

Paulhus et al., 2003; Paulhus & Holden, 2010), the third objective was to consider the OCT as an alternative measure of self-favoring response bias in direct comparison to IM. Given that overclaiming has been used to capture IM as other-deception (e.g., Bing et al., 2011), it is expected to account for overly positive self-presentation in the form of the difference between actual and hypothetical behavior in the DG when taken as a general measure of self- enhancement. In contrast, if overclaiming merely reflects knowledge exaggeration (or other TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 9 processes such as cognitive exploration), it is expected to be unrelated to the difference between actual and hypothetical behavior in the DG.

In sum, based on a true-virtue account, IM is expected to show associations with honest behavior in the cheating task. For a response-bias account on the other hand, both IM and the OCT should be associated with each other and with discrepancies between actual and self-reported behavior in the DG.

Method

Participants and Procedure

The study was conducted online in close agreement with contemporary standards of online experimenting (Reips, 2002) and closely adhered to the guidelines of the local ethics board. None of the preconditions which might indicate that ethics approval (e.g., psychological risks, physical risks, deception, etc.) should be sought were fulfilled, so the local ethics committee declared this study exempt. Note that both IM and overclaiming appear to be stable across administration modes (Dodou & de Winter, 2014; Paulhus & Harms,

2004). The study relied on a longitudinal design to separate the measures of personality and response biases from the behavioral outcomes and to prevent participants from remembering their exact retained amount of money in the hypothetical DG when playing the actual DG.

Hence, the second part of the study took place 6 weeks after the first part. At the first measurement occasion, after providing informed consent, participants completed the overclaiming task, the HEXACO-60, and the BIDR. Afterwards, participants were asked to play a one-shot hypothetical DG, and to provide demographic background information.

Approximately 6 weeks later, participants were invited to the second part of the study. At the second measurement occasion, participants first completed the cheating task and then the actual DG, both of which were incentivized.

Participants were recruited via an online access panel that provided a sample roughly representative of the German population. The sample comprised 461 individuals who TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 10 completed both parts of the study. The drop-out rate after the first part was 42%. As participants were technically required to provide a response before proceeding, there were no missing data. Of the participants, 239 (52%) were male. Age ranged from 19 to 83 (M = 47.4,

SD = 13.3). Participants indicated native or fluent German language skills. As intended, there was broad variation concerning occupational status and level of education. Participants received a flat fee provided by the panel provider and an additional payoff depending on their behavior in the cheating task and the actual DG. In both games, individuals could win a maximum of 5 Euro each. At the beginning of the second part of the study, participants were informed that one of these games was drawn randomly after study completion to determine their individual payoff. Additionally, participants received an individual feedback about their personality at the end of the study.

Materials

Balanced Inventory of Desirable Responding (BIDR). We used the German 20-item version of the BIDR (Musch, Brockhaus, & Bröder, 2002) which assesses self-deceptive enhancement and IM, respectively, with 10 items each. The IM scale consists of items 21, 23,

24, 25, 29, 30, 33, 35, 36, and 37 of the English BIDR-6 (Paulhus, 1991, 1994). Musch et al.

(2002) dropped the remaining items due to low loadings and/or unclear factor solutions, so that the best items in terms of psychometric properties were retained. The German BIDR shows good convergent validity and captures the two-factor solution of the construct well

(Musch et al., 2002). All items were rated on a 5-point Likert scale. In scoring the IM scale continuously, we employed the standard and recommended procedure (Musch et al., 2002;

Stöber, Dette, & Musch, 2002) of the German BIDR. While recommending dichotomous scoring (i.e., only counting the extreme responses), the BIDR-6 reference manual (Paulhus,

1994) also authorizes the use of a 5-point instead of a 7-point scale and continuous instead of dichotomous scoring. Cronbach’s alpha of the IM scale was .72, which is similar to previous TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 11 research (e.g., Borkenau & Zaltauskas, 2009; Musch et al., 2002; Paulhus, 1991; Zettler et al.,

2015). When scored dichotomously, Cronbach’s alpha of the IM scale was .61.

Overclaiming. An itemset comprising 60 terms spanning three content domains (20 terms each) allegedly concerning academic and everyday knowledge was used to assess overclaiming. Domains were “the arts and philosophy”, “natural sciences”, and “social sciences”, so the item content was rather similar to the measure developed by Paulhus et al.

(2003). In each domain, items comprised 12 targets (factually existing terms, such as

“categorical imperative”) and 8 lures (non-existent terms, such as “paradox of duplicity”).

The itemset was composed of items previously used in related studies (Dunlop et al., 2017, study 2; Musch, Ostapczuk, & Klaiber, 2012; Müller & Moshagen, 2018; Müller &

Moshagen, in press). As recommended by Paulhus (2011), we therefore relied on items that have been shown to work well in the German language.

The measure was introduced as a general knowledge test. The order of the domains was random and the items within each domain were also presented in random order.

Participants were instructed to indicate for each item if they knew the term or not. In most implementations of the OCT, participants are asked to rate their familiarity with each item on a continuous scale. The resulting scores are then dichotomized (e.g., Paulhus et al., 2003), because the calculation of indices of response bias derived from signal detection theory

(Macmillan & Creelman, 2005) requires two distinct categories of yes and no answers.

However, when detecting a stimulus, there is no intermediate area between detection and nondetection. Therefore, we followed similar approaches (Dunlop et al., 2017, study 2; Musch et al., 2012) and opted to directly use a dichotomous scoring procedure with participants indicating familiarity or unfamiliarity with each item.

HEXACO. The German version (Moshagen, Hilbig, & Zettler, 2014) of the

HEXACO-60 (Ashton & Lee, 2009) measures the six broad dimensions of personality (i.e., honesty-humility, emotionality, extraversion, agreeableness, conscientiousness, and openness) TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 12 with 10 items each. Factor structure, re-test reliability, and convergent validity of the German version match the English original (Moshagen, Hilbig, & Zettler, 2014). All items were answered on a 5-point Likert scale. Cronbach’s alpha of the honesty-humility scale was .71, which is similar to other studies (Ashton & Lee, 2009; Moshagen, Hilbig, & Zettler, 2014).

Cheating paradigm. To assess (dis)honest behavior, we relied on a variant of the die- under-the-cup paradigm (e.g., Shalvi, Handgraaf, & De Dreu, 2011) used by Zettler et al.

(2015) and others (e.g., Hilbig & Zettler, 2015; Peer, Acquisti, & Shalvi, 2014). Participants were asked to take a conventional coin with two sides, to choose one side of the coin as their target side (i.e., heads or tails), and toss the coin exactly twice. Participants were told that they were eligible for receiving a flat fee of 5 Euro when they tossed the target side exactly twice, but that tossing the target side only once or not at all would not yield any monetary gain.

Finally, participants were asked if they had tossed their target side exactly two times and were thus eligible to winning 5 Euros, which they could either affirm or negate by choosing the respective option. It was explained to participants that we cannot detect if one was answering the question honestly or not, but that we expected them to do the former. In addition to that, participants were informed about the statistical probability of winning both tosses (25%) to prevent individuals from avoiding cheating due to assumed improbability of winning. This paradigm offers the advantage that participants’ anonymity is fully protected, since both individuals who got lucky and individuals who cheated provide the same response

(Moshagen, Hilbig, Erdfelder, & Moritz, 2014). At the same time, it is possible to estimate the extent of cheating on the aggregate, given that the probability of winning is conclusively known (namely, 25% by the binomial distribution; for details, see Moshagen & Hilbig, 2017).

Dictator game. A standard operationalization of the DG (e.g., Cherry, Frykblom, &

Shogren, 2002; Forsythe et al., 1994) was employed. The game was introduced as a decision game. In the hypothetical scenario (used at the first measurement occasion), participants were instructed to imagine the situation and behave the same way they would if the situation was TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 13 real. In particular, participants were asked to imagine being matched with an unknown other participant of the study (whom they would never knowingly meet and who would never knowingly meet them) and were given the task to hypothetically divide an amount of 5 Euro at will between themselves and the other person in whatever way they wanted to. At the second measurement occasion, the scenario was virtually the same with the exception that an actual monetary payoff of 5 Euro was involved. Accordingly, the wording was changed from subjunctive to indicative mode.

Results

Descriptive Results

The IM scale showed substantial variation (M = 2.96, SD = 0.63). In the overclaiming task, the mean hit rate was .59 (SD = .21) and the false alarm rate was .20 (SD = .20). As recommended (Paulhus et al., 2003; Paulhus & Petrusic, 2010), we used signal detection theory with a loglinear correction of the hit and false alarm rates (Stanislaw & Todorov, 1999) to determine the index of discrimination accuracy, d’ = z(H) – z(FA), and the location of the criterion, c = –.5 * [z(H) + z(FA)], as a measure of overclaiming. c denotes the threshold to respond “Yes”, so that higher values of c are associated with a lower probability to respond

“Yes” (i.e., less overclaiming). In line with Paulhus et al. (2003), we thus expressed the extent of overclaiming as OCI = (–1) * c, so that higher values of the OCI correspond to stronger overclaiming. The OCI indicated substantial variance in overclaiming (M = –0.36, SD = 0.65).

A closer inspection of d’ revealed that individuals were able to discriminate well between targets and lures (M = 1.26, SD = 0.53). The OCI and d’ were negatively correlated (r = –.20, p < .001). In general, descriptive results concerning IM and the hit and false alarm rates in the overclaiming task were comparable to related research (Dunlop et al., 2017, study 2; Musch et al., 2012; Paulhus et al., 2003; Zettler et al., 2015). In particular, the observed false alarm rate was similar to those reported by other studies regardless of the applied scoring procedures

(e.g., Dunlop et al., 2017; Kam et al., 2015; Paulhus et al., 2003). TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 14

To analyze the cheating task, we first compared the proportion of alleged wins (i.e., individuals claiming to have won) to the statistical baseline probability of winning. One hundred and sixty-five (36%) of the participants claimed to have won (i.e., to have obtained two successful tosses), which is significantly higher than would be expected based on the baseline probability of winning (z = 4.83, p < .001) and similar to the proportion observed by

Zettler et al. (2015). We then estimated the proportion of dishonest individuals as proposed by

Moshagen and Hilbig (2017) by d = (q – p) / (1 – p), where q is the sample proportion of reported wins and p denotes the baseline probability of winning. In our sample, the proportion of dishonest individuals thus was d = .15 (SE = .03, p < .001), indicating that the cheating task factually induced cheating behavior.

For the two variants of the DG, we obtained a measure of selfish behavior by considering the proportionate amount of money kept for oneself (e.g., Hilbig, Thielmann, et al., 2015). Overall, participants kept more than half of the amount for themselves in both variants (Mhyp = 58.9%, SD = .22 and Mact = 59.7%, SD = .21). The retained amount in the hypothetical variant was positively, but far from perfectly, correlated with the actually retained amount (r = .33, p < .001). We calculated an index of self-presentation by subtracting the percentage of the retained amount in the hypothetical DG from the percentage of the retained amount in the actual DG, DGdiff = DGact – DGhyp. Hence, higher (positive) values correspond to a larger discrepancy between one’s self-reported estimate of the amount of money retained in the hypothetical scenario and the actual amount of money retained in the incentivized scenario, that is, more self-presentation as individuals actually took more money for themselves than initially reported. It should be noted that difference scores are typically less reliable than the scores from which they are derived. However, estimates of internal consistency are unavailable in this context because both DG behaviors were assessed by a single item.

Hypothesis Tests TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 15

Given that the observed number of alleged wins in the cheating task comprises both, individuals who factually won and individuals who incorrectly claimed to have won, adapted statistical procedures are required accounting for this contamination when investigating the relationship between dishonesty in the cheating task and other variables. Accordingly, all analyses involving cheating behavior were performed using the approach detailed in

Moshagen and Hilbig (2017), which essentially involves an attenuation correction for the disturbance due to actual wins by honest individuals. Associated standard errors were estimated using the bootstrap based on 1,000 samples. A similar correction was applied in the logistic regressions analyses predicting dishonesty (see Moshagen & Hilbig, 2017, for details). These analyses were performed using the R package RRreg (Heck & Moshagen,

2018).

The correlation results are summarized in Table 1. Replicating previous research

(Zettler et al., 2015), IM and cheating were negatively correlated (r = –.10, p = .046) so that higher IM scores were associated with more, rather than less, honest behavior in the cheating task. Similar to other studies (de Vries et al., 2014; Dunlop et al., 2017; Zettler et al., 2015), there was a substantial positive correlation between IM and HH (r = .50, p < .001). Thus, IM was positively associated with both self-reported trait honesty and honest behavior in the cheating task. As such, these results are generally in line with a trait-honesty account of IM.

To evaluate whether IM also comprises response bias tendencies (in addition to honest-trait variance), we considered the relation between IM and the discrepancy index derived from the difference between actual and hypothetical DG. Results indicate that higher

IM scores were associated with a larger discrepancy between actual and hypothetical behavior in the DG (r = .22, p < .001). Apparently, participants high in IM claimed to be more humble and cooperative in the DG as long as the scenario was hypothetical (rIM, DG hyp = –.27, p <

.001). As soon as actual money was at stake, the association between IM and behavior in the

DG vanished (rIM, DG act = –.03, p = .56). Hence, individuals high in IM tended to present TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 16 themselves in an overly favorable manner, suggesting that IM represents both, honest-trait variance (as evident in the cheating task) and a response style (as evident in the difference between actual and hypothetical behavior in the DG).

In contrast to IM, the tests concerning overclaiming provided insufficient evidence to discard any null hypothesis, suggesting that overclaiming did not exhibit significant relations to any of the other variables. Overclaiming was essentially uncorrelated with IM, HH, the behavioral outcomes, and, perhaps most strikingly, with the difference between actual and hypothetical behavior in the DG (Table 1). Following recommendations by Paulhus et al.

(2003) to examine bias and accuracy parameters jointly, we also performed multiple regressions analyses. Specifically, we controlled for accuracy by predicting each variable under scrutiny by both the OCI and d’. However, the resulting regression coefficients for the

OCI were closely aligned with the bivariate associations displayed in Table 1, that is, the OCI was no significant predictor of either outcome, even when controlling for accuracy. Thus, although overclaiming itself is conceptualized in a similar vein as a criterion-discrepancy measure, it was unrelated to the other criterion-discrepancy measure employed in this study, as defined by the difference between actual (criterion) and hypothetical behavior (self- reported estimate) in the DG.

On an interesting side note, we also found self-reported trait honesty (HH) to be positively related to some level of self-presentation. Whereas HH showed a strong negative association with dishonest behavior in the cheating task (r = –.43, p < .001) and also exhibited a negative correlation with selfishness in the hypothetical DG (r = –.30, p < .001), the latter decreased substantially concerning behavior in the actual DG (r = –.19, p < .001). In turn, HH was also positively related to the DG difference (r = .10, p = .032). In contrast to IM, however, HH was still linked with humble, cooperative behavior in the actual DG.

Furthermore, the correlations of IM and HH with the DG difference differed significantly (z =

2.60, p = .009), adding to the notion that IM captured more response bias variance than HH. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 17

Given the intercorrelations between cheating, IM, and HH, we also evaluated IM and

HH jointly using regression analyses. We first regressed cheating on IM and HH (both z- standardized) using logistic regression (Moshagen & Hilbig, 2017). As shown in Table 2, only HH was a significantly negative predictor of cheating (OR = 0.27, χ2(1) = 17.8, p <

.001). As expected, the odds of cheating thus decrease by 63% for one standard deviation increase in HH. In addition, we also computed the semi-partial correlation between cheating and IM residualized for HH. When controlling IM for HH, and, thus, the substantial amount of shared variance between them, IM was positively associated with cheating (r = .13, p =

.012), in contrast to the negative bivariate correlation between IM and cheating. For HH on the other hand, the semi-partial correlation between cheating and HH residualized for IM remained virtually the same (r = –.44, p < .001) as the bivariate correlation. We also considered HH and IM as predictors of self-presentation as the difference between actual and hypothetical behavior in the DG (Table 3). IM was the only significant predictor (β = .23, t(457) = 4.25, p < .001), R2 = .05, F(2, 457) = 9.63, p < .001. Thus, the positive bivariate association between self-presentation (as measured by the difference between actual and hypothetical DG) and HH disappeared when controlling for IM.

Finally, we also analyzed the data using a dichotomous scoring procedure for the IM subscale of the BIDR as recommended by Paulhus (1991, 1994) by counting only 4 and 5 responses as 1 point and summing these to obtain the IM scale scores. The dichotomously scored IM scale exhibited the same pattern of correlations to the remaining measures as the continuously scored IM scale (in particular a positive correlation with the difference between actual and hypothetical dictator game), but the correlations tended to be slightly smaller, which is likely due to less variance and/or worse reliability. The correlation between cheating and dichotomously scored IM was exactly the same (r = –.10, p = .046). These additional analyses are presented in full in Supplements 1-3.

Discussion TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 18

Measures of self-favoring response biases such as IM scales or the OCT are intended to allow for the assessment of self-favoring response biases in studies relying on self-reported data. While the idea of such measures addresses the common threat of invalid responding, empirical tests of the validity of the proposed validity measures yielded inconsistent results

(e.g., Bensch et al., in press; Bing et al., 2011; Borkenau & Zaltauskas, 2009; Dunlop et al.,

2017; Kam et al., 2015). IM as a construct, in particular, was a target of criticism due to its rather strong associations with personality traits (e.g., Ones et al., 1996), which led some to argue that IM primarily reflects interpersonally oriented self-control (Uziel, 2010, 2014) or trait honesty (de Vries et al., 2014; Zettler et al., 2015), rather than response bias.

To put these competing notions to a test, we let a large sample complete an incentivized cheating task and both a hypothetical and an incentivized dictator game. If IM can be considered to reflect honesty, it would be expected to be positively associated with actual honest behavior. In contrast, if IM primarily measures self-favoring response bias, it should account for self-presentation as measured by the difference between actual behavior in the incentivized DG and self-reported estimates of behavior in the hypothetical DG. In addition, we also considered the overclaiming technique as an alternative approach to assess deviations between self-reports from a known criterion.

Impression Management – A Blend of True Virtues and Response Bias

Overall, our findings are in line with both interpretations of IM suggesting that it blends both trait honesty and self-presentation and, thus, confounds substance with style. In line with a trait-honesty account, the present study found positive associations with trait honesty and honest behavior in the cheating task. In line with a response-bias account, we found IM to be associated with stronger self-presentation, as individuals high in IM kept more money for themselves in the actual DG than they had previously claimed to do in the hypothetical scenario. Corroborating and extending previous research (de Vries et al., 2014;

Zettler et al., 2015), the present study thus indicates that impression managers do not only TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 19 claim to be high in honesty-humility as they deem it socially desirable, but truly possess virtues and tend to behave accordingly in corresponding situations such as a cheating task.

This trait-honesty account also explains previous meta-analytic results showing that IM exhibits substantial associations with other personality traits such as agreeableness and conscientiousness (Connelly & Chang, 2016; Ones et al., 1996; Paulhus, 2002) as these are important correlates of HH (e.g., Ashton et al., 2014). On the other hand, research suggests that IM still reflects inflated self-reports (Connelly & Chang, 2016; Lönnqvist et al., 2007).

This is perfectly in line with our finding that – while being associated with both trait honesty and honest behavior – IM still accounted for self-presentation as the deviation of actual behavior in the DG from the estimated behavior in the hypothetical DG. Taken together, our results suggest that the IM subscale of the BIDR largely reflects trait variance of HH, but – to a lesser extent – also appears to be sensitive to overly positive self-descriptions.

A measure that comprises trait variance of honesty in addition to response bias might not be optimal when utilized as a validity scale. On the other hand, IM at least did account for self-presentation in contrast to the overclaiming technique. Ultimately, this finding suggests that the IM subscale of the BIDR has some validity in signaling self-favoring response bias and could thus be useful as a screening tool to flag potentially inaccurate responding. Such applications could particularly benefit contexts in which serious decisions are based on the assessment. For example, a prisoner seeking parole showing a high level of IM could be given a more extensive assessment compared to another inmate low in IM. However, due to the apparent confound with trait honesty, the IM scale would be associated with rather high sensitivity in conjunction with rather low specificity when used as a screener and should be accompanied by follow-up tests. Regarding a possible confound, the IM subscale of the BIDR can be compared to the MMPI Lie (L) scale (Hathaway & McKinley, 1951) and the MMPI-2-

RF Uncommon Virtues (L-r) scale (Ben-Porath & Tellegen, 2011). These scales are designed to detect underreporting and their items load highly on impression management (Bagby & TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 20

Marshall, 2004). Interestingly, there have also been reports suggesting them to be confounded with virtuousness in the form of religiosity and a “traditional upbringing” (Bridges & Baum,

2013; Duris, Bjorck, & Gorsuch, 2007). More importantly, however, this possible confound does not seem to impair their validity in detecting underreporting (Baer & Miller, 2002;

Rosen, Baldwin, & Smith, 2016; Sellbom & Bagby, 2008). In a similar vein, the IM subscale of the BIDR has shown validity in forensic samples (Davis, Thake, & Weekes, 2012; Lanyon

& Carle, 2007).

Overclaiming – Independent of all Other Measures

Whereas overclaiming was unrelated to both IM and HH and thus was apparently not contaminated by trait variance, it also failed to exhibit any relation to behavior in the paradigms, and, most importantly, to the discrepancy between self-reported and actually performed behavior in the DG. This is especially surprising since the OCT itself is suggested as a form of criterion-discrepancy and behavioral measure rather than a questionnaire such as

IM or other lie scales (Paulhus et al., 2003; Paulhus & Holden, 2010). Our findings are generally in line with notions questioning the validity of overclaiming as a measure of self- favoring response biases. For example, the OCT only weakly predicted faking self-reports in an applicant setting (Feeney & Goffin, 2015). Likewise, Kam et al. (2015) determined the extent of social desirability of personality items via the OCT and found these ratings being negatively related to directly evaluated item desirability. Similar to our results, other studies

(e.g., Dunlop et al., 2017; Mesmer-Magnus, Viswesvaran, Deshpande, & Joseph, 2006) also failed to find associations between IM and the OCT. Furthermore, Dunlop et al. suggested a cognitive-exploration account of overclaiming on the basis of its positive association with openness. Thus, overclaiming could rather represent a cognitive bias instead of response distortion, reflecting the tendency to think that one has already heard of a term, rather than a deliberate distortion of one’s self-presentation in order to appear more favorably. A similar TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 21 interpretation of overclaiming is also suggested by its association with the hindsight bias

(Müller & Moshagen, 2018).

Honesty-Humility – True Honesty With a Little Response Bias

The results of the present study also indicate that HH is subject to a self-presentation bias, as evident by the correlation with the discrepancy between actual and self-reported behavior in the DG. Even though the correlation was considerably smaller compared to the one shown by IM, this finding nonetheless suggests that there is some level of self- presentation involved in HH, too. Although this issue has been discussed (e.g., Ashton & Lee,

2010; Ashton et al., 2014), its effect was deemed to be rather irrelevant given that HH consistently predicts behavioral outcomes in the realm of honest and humble behavior (e.g.,

Heck et al., 2018; Hilbig, Moshagen, & Zettler, 2015; Hilbig, Thielmann, et al., 2015).

Indeed, unlike IM which was only associated with estimated behavior in the hypothetical dictator game but not with behavior in the actual game, HH still showed a negative association with selfish behavior in both games. Thus, while keeping a little more money than initially claimed, individuals high in HH acted humbly in both situations, that is, in accordance with the trait conception of the scale. Following these findings, both IM and HH

(to some degree) assess the tendency to be honest as shown by negatively predicting cheating behavior. In that, however, HH clearly is the superior measure of true virtues.

Limitations

This study is subject to some limitations. For one, it seems likely that the extent to which IM reflects trait honesty versus response styles may depend on characteristics of the assessment context. The assessment context realized in the present study can be considered as a low-demand situation, as there was little motivation for participants to provide overly positive self-descriptions. It is conceivable that the degree to which IM reflects response biases increases with the demands of the situation. For example, candidates applying for a job, parents in a child custody evaluation, or psychiatric inpatients attempting to be released may TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 22 have a higher motivation to provide inflated self-reports, which arguably may increase the proportion of variance in IM reflecting response biases. It would be interesting to vary situational demands experimentally while considering a behavioral outcome (such as cheating behavior) to determine how the relative contribution of response style and honest trait variance changes. It might also be interesting to apply latent class analysis on the joint responses to both an IM scale and HH to disentangle response styles and actual trait honesty.

In a similar vein, overclaiming might be context- and content-specific (e.g., Paulhus,

2011). For example, overclaiming might only be indicative of IM when an audience is salient, which was not the case in the present study. Likewise, interpreting the effect of knowledge exaggeration – which was evident for some individuals in the present study – as self-favoring response bias might depend on the relevance of the item content. While the item content of the present itemset was comparable to other studies (e.g., Paulhus et al., 2003), knowledge exaggeration concerning arts and sciences does not have a distinct conceptual link to communally-oriented behavior as assessed by the DG (cf., Gebauer, Sedikides, Verplanken,

& Maio, 2012).

It should also be noted that neither the cheating paradigm nor the DG difference must be optimal indicators of behavioral dishonesty and self-presentation, respectively, as other forms of the behaviors could be unrelated to the present indicators. Individuals might present themselves overly favorably in a child custody evaluation to obtain custody, but this does not necessarily mean that they will also claim to be more generous than they actually are in a DG.

Furthermore, the hypothetical DG was assessed in conjunction with IM and HH at the first measurement occasion, so these correlations might be inflated due to consistent responding or similar effects.

Finally, there is an ongoing discussion about how to score the IM scale of the BIDR.

We used a 5-point Likert scale that was scored continuously and showed that analyzing the data with a dichotomously scored IM scale yielded equivalent results. While empirical TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 23 evidence indicates little difference between the type of scaling used and the scoring method applied (e.g., Li & Bagger, 2007; Stöber et al., 2002; Vispoel & Kim, 2014; Zettler et al.,

2015), different results might be obtained when using a 7-point scale (e.g., Gignac, 2018). A related issue pertains to the scale length of the BIDR. For psychometric reasons, the German version (Musch et al., 2002) contains only half the items of the English BIDR-6. While item selection proceeded in line with the factorial structure of the IM construct, the results should be replicated using the full itemset of the BIDR-6. Similarly, we only considered the IM scale of the BIDR so that conclusions drawn regarding IM are limited to the scale used in this study. As it remains an open question if the present findings are generalizable to other IM scales, it is recommended to investigate whether related scales that are based on a similar rationale (such as the MMPI-2-RF L-r scale or the Marlowe-Crowne Social Desirability

Scale) behave similarly.

Conclusion

In conclusion, the present study provides evidence supporting both a response-bias and trait-honesty account concerning the IM subscale of the BIDR, indicating that it must be considered to reflect a blend of style and substance. However, whereas IM does account for self-presentation to a certain extent, the construct is highly contaminated by trait variance of honesty-humility. As such, high scores on IM cannot immediately be interpreted as reflecting self-favoring response biases that may invalidate an assessment. Nevertheless, IM might still be useful as a screening instrument, provided that the arguably low specificity of such scores is taken into account. On the other hand, overclaiming as assessed in the present study did not account for self-presentation and also failed to show any relation to behavior, personality, or

IM. While this may cast doubt on the validity of the OCT as a measure of self-presentation, the underlying processes influencing overclaiming tendencies should be examined more closely.

TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 24

References

Allport, G. W. (1928). A test for ascendance-submission. The Journal of Abnormal and Social

Psychology, 23, 118–136. https://doi.org/10.1037/h0074218

Ashton, M. C., & Lee, K. (2001). A theoretical basis for the major dimensions of personality.

European Journal of Personality, 15, 327–353. https://doi.org/10.1002/per.417

Ashton, M. C., & Lee, K. (2008). The prediction of honesty-humility-related criteria by the

HEXACO and Five-Factor Models of personality. Journal of Research in Personality,

42, 1216–1228. https://doi.org/10.1016/j.jrp.2008.03.006

Ashton, M. C., & Lee, K. (2009). The HEXACO-60: A short measure of the major

dimensions of personality. Journal of Personality Assessment, 91, 340–345.

https://doi.org/10.1080/00223890902935878

Ashton, M. C., & Lee, K. (2010). Trait and source factors in HEXACO‐PI‐R self‐and

observer reports. European Journal of Personality, 24, 278–289.

https://doi.org/10.1002/per.759

Ashton, M. C., Lee, K., & De Vries, R. E. (2014). The HEXACO honesty-humility,

agreeableness, and emotionality factors: A review of research and theory. Personality

and Social Psychology Review, 18, 139–152.

https://doi.org/10.1177/1088868314523838

Baer, R. A., & Miller, J. (2002). Underreporting of psychopathology on the MMPI-2: A meta-

analytic review. Psychological Assessment, 14, 16–26. https://doi.org/10.1037/1040-

3590.14.1.16

Bagby, R. M., & Marshall, M. B. (2004). Assessing underreporting response bias on the

MMPI-2. Assessment, 11, 115–126. https://doi.org/10.1177/1073191104265918

Barber, L. K., Barnes, C. M., & Carlson, K. D. (2013). Random and systematic error effects

of insomnia on survey behavior. Organizational Research Methods, 16, 616–649.

https://doi.org/10.1177/1094428113493120 TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 25

Ben-Porath, Y. S., & Tellegen, A. (2011). MMPI-2-RF: Manual for administration, scoring,

and interpretation. Minneapolis, MN: University of Minnesota Press.

Bensch, D., Paulhus, D. L., Stankov, L., & Ziegler, M. (in press). Teasing apart overclaiming,

overconfidence, and socially desirable responding. Assessment. Advance online

publication. https://doi.org/10.1177/1073191117700268.

Bernreuter, R. G. (1933). Validity of the personality inventory. Personnel Journal, 11, 383–

386.

Bing, M. N., Kluemper, D., Davison, H. K., Taylor, S., & Novicevic, M. (2011).

Overclaiming as a measure of faking. Organizational Behavior and Human Decision

Processes, 116, 148–162. https://doi.org/10.1016/j.obhdp.2011.05.006

Borkenau, P., & Ostendorf, F. (1992). Social desirability scales as moderator and suppressor

variables. European Journal of Personality, 6, 199–214.

https://doi.org/10.1002/per.2410060303

Borkenau, P., & Zaltauskas, K. (2009). Effects of self-enhancement on agreement on

personality profiles. European Journal of Personality, 23, 107–123.

https://doi.org/10.1002/per.707

Bridges, S. A., & Baum, L. J. (2013). An examination of the MMPI-2-RF L-r scale in an

outpatient Protestant sample. Journal of Psychology and Christianity, 32, 115–124.

Cherry, T. L., Frykblom, P., & Shogren, J. F. (2002). Hardnose the dictator. The American

Economic Review, 92, 1218–1221.

Connelly, B. S., & Chang, L. (2016). A meta‐analytic multitrait multirater separation of

substance and style in social desirability scales. Journal of Personality, 84, 319–334.

https://doi.org/10.1111/jopy.12161

Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of

psychopathology. Journal of Consulting Psychology, 24, 349–354. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 26

Davis, C. G., Thake, J., & Weekes, J. R. (2012). Impression managers: Nice guys or serious

criminals? Journal of Research in Personality, 46, 26–31.

https://doi.org/10.1016/j.jrp.2011.11.001 de Vries, R. E., Zettler, I., & Hilbig, B. E. (2014). Rethinking trait conceptions of social

desirability scales: Impression management as an expression of honesty-humility.

Assessment, 21, 286–299. https://doi.org/10.1177/1073191113504619

Dodou, D., & de Winter, J. (2014). Social desirability is the same in offline, online, and paper

surveys: A meta-analysis. Computers in Human Behavior, 36, 487–495.

https://doi.org/10.1016/j.chb.2014.04.005

Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., & Ludeke, S. G.

(2017). Openness to (reporting) experiences that one never had: Overclaiming as an

outcome of the knowledge accumulated through a proclivity for cognitive and

aesthetic exploration. Journal of Personality and Social Psychology, 113, 810–834.

https://doi.org/10.1037/pspp0000110

Duris, M., Bjorck, J. P., & Gorsuch, R. L. (2007). Christian subcultural differences in item

perceptions of the MMPI-2 Lie Scale. Journal of Psychology & Christianity, 26, 356–

366.

Edwards, A. L. (1957). The social desirability variable in personality assessment and

research. New York, NY: The Dryden Press.

Feeney, J. R., & Goffin, R. D. (2015). The Overclaiming Questionnaire: A good way to

measure faking? Personality and Individual Differences, 82, 248–252.

https://doi.org/10.1016/j.paid.2015.03.038

Forsythe, R., Horowitz, J. L., Savin, N. E., & Sefton, M. (1994). Fairness in simple

bargaining experiments. Games and Economic Behavior, 6, 347–369.

https://doi.org/10.1006/game.1994.1021 TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 27

Gebauer, J. E., Sedikides, C., Verplanken, B., & Maio, G. R. (2012). Communal narcissism.

Journal of Personality and Social Psychology, 103, 854–878.

https://doi.org/10.1037/a0029629

Gignac, G. E. (2018). Socially desirable responding suppresses the association between self-

assessed intelligence and task-based intelligence. Intelligence, 69, 50–58.

https://doi.org/10.1016/j.intell.2018.05.003

Grosz, M. P., Lösch, T., & Back, M. D. (2017). A dimension- and content-specific approach

to investigate the narcissism-overclaiming link. Journal of Research in Personality,

70, 134–138. https://doi.org/10.1016/j.jrp.2017.05.006

Güth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum

bargaining. Journal of Economic Behavior & Organization, 3, 367–388.

https://doi.org/10.1016/0167-2681(82)90011-7

Hathaway, S. R., & McKinley, J. C. (1951). Minnesota Multiphasic Personality Inventory

manual, revised. San Antonio, TX: Psychological Corporation.

Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression

analyses of randomized response data. Journal of Statistical Software, 85, 1–29.

https://doi.org/10.18637/jss.v085.i02

Heck, D. W., Thielmann, I., Moshagen, M., & Hilbig, B. E. (2018). Who lies? A large-scale

reanalysis linking basic personality traits to unethical decision making. Judgment and

Decision Making, 13, 356–371.

Hilbig, B. E., Moshagen, M., & Zettler, I. (2015). Truth will out: Linking personality,

morality, and honesty through indirect questioning. Social Psychological and

Personality Science, 6, 140–147. https://doi.org/10.1177/1948550614553640

Hilbig, B. E., Thielmann, I., Wührl, J., & Zettler, I. (2015). From honesty-humility to fair

behavior – Benevolence or a (blind) fairness norm? Personality and Individual

Differences, 80, 91–95. https://doi.org/10.1016/j.paid.2015.02.017 TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 28

Hilbig, B. E., & Zettler, I. (2015). When the cat’s away, some mice will play: A basic trait

account of dishonest behavior. Journal of Research in Personality, 57, 72–88.

https://doi.org/10.1016/j.jrp.2015.04.003

Jackson, D. N., & Messick, S. (1958). Content and style in personality assessment.

Psychological Bulletin, 55, 243–252. https://doi.org/10.1037/h0045996

Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1986). Fairness and the assumptions of

economics. Journal of Business, 59, 285–300.

Kam, C., Risavy, S. D., & Perunovic, W. E. (2015). Using over-claiming technique to probe

social desirability ratings of personality items: A validity examination. Personality

and Individual Differences, 74, 177–181. https://doi.org/10.1016/j.paid.2014.10.017

Kemper, C. J., & Menold, N. (2014). Nuisance or remedy? The utility of stylistic responding

as an indicator of data fabrication in surveys. Methodology, 10, 92–99.

https://doi.org/10.1027/1614-2241/a000078

Konstabel, K., Aavik, T., & Allik, J. (2006). Social desirability and consensual validity of

personality traits. European Journal of Personality, 20, 549–566.

https://doi.org/10.1002/per.593

Lanyon, R. I., & Carle, A. C. (2007). Internal and external validity of scores on the Balanced

Inventory of Desirable Responding and the Paulhus Deception Scales. Educational

and Psychological Measurement, 67, 859–876.

https://doi.org/10.1177/0013164406299104

Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO Personality

Inventory. Multivariate Behavioral Research, 39, 329–358.

https://doi.org/10.1207/s15327906mbr3902_8

Lee, K., & Ashton, M. C. (2008). The HEXACO personality factors in the indigenous

personality lexicons of English and 11 other languages. Journal of Personality, 76,

1001–1054. https://doi.org/10.1111/j.1467-6494.2008.00512.x TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 29

Li, A., & Bagger, J. (2007). The Balanced Inventory of Desirable Responding (BIDR): A

reliability generalization study. Educational and Psychological Measurement, 67,

525–544. https://doi.org/10.1177/0013164406292087

Lönnqvist, J. E., Paunonen, S., Tuulio‐Henriksson, A., Lönnqvist, J., & Verkasalo, M. (2007).

Substance and style in socially desirable responding. Journal of Personality, 75, 291–

322. https://doi.org/10.1111/j.1467-6494.2006.00440.x

Ludeke, S. G., & Makransky, G. (2016). Does the Over-Claiming Questionnaire measure

overclaiming? Absent convergent validity in a large community sample. Psychological

Assessment, 28, 765–774. https://doi.org/10.1037/pas0000211

Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.).

Mahwah, NJ: Erlbaum.

McCrae, R. R., & Costa, P. T. (1983). Social desirability scales: More substance than style.

Journal of Consulting and Clinical Psychology, 51, 882–888.

https://doi.org/10.1037/0022-006X.51.6.882

Mesmer-Magnus, J., Viswesvaran, C., Deshpande, S., & Joseph, J. (2006). Social desirability:

The role of over-claiming, self-esteem, and emotional intelligence. Psychology

Science, 48, 336–356.

Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms.

Behavior Research Methods, 49, 724–732. https://doi.org/10.3758/s13428-016-0729-x

Moshagen, M., Hilbig, B. E., Erdfelder, E., & Moritz, A. (2014). An experimental validation

method for questioning techniques that assess sensitive issues. Experimental

Psychology, 61, 48–54. https://doi.org/10.1027/1618-3169/a000226

Moshagen, M., Hilbig, B. E., & Zettler, I. (2014). Faktorenstruktur, psychometrische

Eigenschaften und Messinvarianz der deutschsprachigen Version des 60-Item

HEXACO Persönlichkeitsinventars [Factor structure, psychometric properties, and

measurement invariance of the German-language version of the 60-item HEXACO TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 30

personality inventory]. Diagnostica, 60, 86–97. https://doi.org/10.1026/0012-

1924/a000112

Müller, S., & Moshagen, M. (2018). Overclaiming shares processes with the hindsight bias.

Personality and Individual Differences, 134, 298–300.

https://doi.org/10.1016/j.paid.2018.06.035

Müller, S., & Moshagen, M. (in press). Controlling for response bias in self-ratings of

personality – A comparison of impression management scales and the overclaiming

technique. Journal of Personality Assessment.

https://doi.org/10.1080/00223891.2018.1451870

Musch, J., Brockhaus, R., & Bröder, A. (2002). Ein Inventar zur Erfassung von zwei Faktoren

sozialer Erwünschtheit [An inventory for the assessment of two factors of socially

desirable responding]. Diagnostica, 48, 121–129. https://doi.org/10.1026//0012-

1924.48.3.121

Musch, J., Ostapczuk, M., & Klaiber, Y. (2012). Validating an inventory for the assessment of

egoistic bias and moralistic bias as two separable components of social desirability.

Journal of Personality Assessment, 94, 620–629.

https://doi.org/10.1080/00223891.2012.672505

Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality

testing for personnel selection: The red herring. Journal of Applied Psychology, 81,

660–679. https://doi.org/10.1037/0021-9010.81.6.660

Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of

Personality and Social Psychology, 46, 598–609. https://doi.org/10.1037/0022-

3514.46.3.598

Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R.

Shaver, & L. S. Wrightsman (Eds.), Measures of social psychological attitudes. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 31

Measures of personality and social psychological attitudes (pp. 17–59). San Diego,

CA: Academic Press.

Paulhus, D. L. (1994). Balanced Inventory of Desirable Responding: Reference manual for

BIDR version 6. Unpublished manuscript, University of British Columbia, Vancouver,

Canada.

Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I.

Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological

and educational measurement (pp. 49–69). Mahwah, NJ: Erlbaum.

Paulhus, D. L. (2011). Overclaiming on personality questionnaires. In M. Ziegler, C.

MacCann, & R. D. Roberts (Eds.), New perspectives on faking in personality

assessment (pp. 151–164). Oxford, NY: Oxford University Press.

Paulhus, D. L., & Harms, P. D. (2004). Measuring cognitive ability with the overclaiming

technique. Intelligence, 32, 297–314. https://doi.org/10.1016/j.intell.2004.02.001

Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming

technique: Measuring self-enhancement independent of ability. Journal of Personality

and Social Psychology, 84, 890–904. https://doi.org/10.1037/0022-3514.84.4.890

Paulhus, D. L., & Holden, R. R. (2010). Measuring self-enhancement: From self-report to

concrete behavior. In C. R. Agnew (Ed.), Then a miracle occurs. Focusing on

behavior in social psychological theory and research; Purdue symposium on

psychological sciences (pp. 227–246). Oxford, NY: Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780195377798.003.0012

Paulhus, D. L., & Petrusic, W. (2010). Measuring individual differences with signal detection

analysis: A guide to indexes based on knowledge ratings. Unpublished Manuscript,

University of British Columbia, Vancouver, . TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 32

Peer, E., Acquisti, A., & Shalvi, S. (2014). “I cheated, but only a little”: Partial confessions to

unethical behavior. Journal of Personality and Social Psychology, 106, 202–217.

https://doi.org/10.1037/a0035392

Phillips, D. L., & Clancy, K. J. (1972). Some effects of “social desirability” in survey studies.

American Journal of Sociology, 77, 921–940. https://doi.org/10.1086/225231

Piedmont, R. L., McCrae, R. R., Riemann, R., & Angleitner, A. (2000). On the invalidity of

validity scales: Evidence from self-reports and observer ratings in volunteer samples.

Journal of Personality and Social Psychology, 78, 582–593.

https://doi.org/10.1037//0022-3514.78.3.582

Randall, D. M., & Fernandes, M. F. (1991). The social desirability response bias in ethics

research. Journal of Business Ethics, 10, 805–817.

https://doi.org/10.1007/BF00383696

Reips, U.-D. (2002). Standards for internet-based experimenting. Experimental Psychology,

49, 243–256. https://doi.org/10.1026//1618-3169.49.4.243

Rosen, G. M., Baldwin, S. A., & Smith, R. E. (2016). Reassessing the “traditional background

hypothesis” for elevated MMPI and MMPI-2 Lie-scale scores. Psychological

Assessment, 28, 1336–1343. https://doi.org/10.1037/pas0000262

Rosenzweig, S. (1934). A suggestion for making verbal personality tests more valid.

Psychological Review, 41, 400–401. https://doi.org/10.1037/h0075452

Sellbom, M., & Bagby, R. M. (2008). Validity of the MMPI-2-RF (restructured form) L-r and

K-r scales in detecting underreporting in clinical and nonclinical samples.

Psychological Assessment, 20, 370–376. https://doi.org/10.1037/a0012952

Shalvi, S., Handgraaf, M. J. J., & De Dreu, C. K. W. (2011). Ethical manoeuvring: Why

people avoid both major and minor lies. British Journal of Management, 22, 16–27.

https://doi.org/ 10.1111/j.1467-8551.2010.00709.x TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 33

Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures.

Behavior Research Methods, Instruments, & Computers, 31, 137–149.

https://doi.org/10.3758/BF03207704

Stöber, J., Dette, D. E., & Musch, J. (2002). Comparing continuous and dichotomous scoring

of the Balanced Inventory of Desirable Responding. Journal of Personality

Assessment, 78, 370–389. https://doi.org/10.1207/S15327752JPA7802_10

Tonković, M., Galić, Z., & Jerneić, Ž. (2011). The construct validity of over-claiming as a

measure of egoistic enhancement. Review of Psychology, 18, 13–21.

Tracy, J. L., Cheng, J. T., Robins, R. W., & Trzesniewski, K. H. (2009). Authentic and

hubristic pride: The affective core of self-esteem and narcissism. Self and Identity, 8,

196–213. https://doi.org/10.1080/15298860802505053

Uziel, L. (2010). Rethinking social desirability scales: From impression management to

interpersonally oriented self-control. Perspectives on Psychological Science, 5, 243–

262. https://doi.org/10.1177/1745691610369465

Uziel, L. (2014). Impression management (“lie”) scales are associated with interpersonally

oriented self‐control, not other‐deception. Journal of Personality, 82, 200–212.

https://doi.org/10.1111/jopy.12045

Vispoel, W. P., & Kim, H. Y. (2014). Psychometric properties for the Balanced Inventory of

Desirable Responding: Dichotomous versus polytomous conventional and IRT

scoring. Psychological Assessment, 26, 878–891. https://doi.org/10.1037/a0036430

Zettler, I., Hilbig, B. E., Moshagen, M., & de Vries, R. E. (2015). Dishonest responding or

true virtue? A behavioral test of impression management. Personality and Individual

Differences, 81, 107–111. https://doi.org/10.1016/j.paid.2014.10.007 TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 34

Table 1

Correlations Between Trait, Bias, and Behavioral Measures

Variable M SD 1 2 3 4 5 6

1. IM 2.96 0.63 (.72)

2. OCI –0.36 0.65 –.04 (-)

3. HH 3.50 0.61 .50** .05 (.71)

4. DG hyp .59 .22 –.27** .04 –.30** (-)

5. DG act .60 .21 –.03 –.01 –.19** .33** (-)

6. DG diff .01 .25 .22** –.04 .10* - - (-)

7. Cheating .15a - –.10* –.02 –.43** .24** .34** .08 Note. N = 461. M = mean; SD = standard deviation; IM = impression management; OCI = overclaiming index; HH = honesty-humility; DG hyp = percentage of retained amount in hypothetical dictator game; DG act = percentage of retained amount in actual dictator game; DG diff = difference between DG act and DG hyp; Cheating = probability of having cheated. Internal consistencies are presented on the diagonal in parentheses. DG act and Cheating were assessed following a 6-weeks delay. ** p < .001, * p < .05. a Estimated proportion of dishonest individuals (see text for details). TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 35

Table 2

Logistic Regression Predicting Cheating by IM and HH

Variable β SEβ OR Wald Statistic p r(X•Y), Cheating

Intercept –2.08 0.33 39.2 < .001

IM 0.49 0.29 1.64 2.83 .083 .13*

HH –1.32 0.31 0.27 17.8 < .001 –.44**

Note. N = 461. OR = odds ratio; IM = impression management; HH = honesty-humility; r(X•Y),

Cheating = Semi-partial correlations between cheating and IM or HH residualized for the respective other (* p < .05, ** p < .001). Cheating was assessed following a 6-weeks delay. TRUE VIRTUE, SELF-PRESENTATION, OR BOTH? 36

Table 3

Multiple Regression Models Predicting Behavior in the Dictator Games by IM and HH

2 Model / Estimate β SEβ t F p R r(X•Y), DG

DG hyp 27.5 < .001 .11

IM –.16 .05 –3.20 .002 –.14*

HH –.21 .05 –4.16 < .001 –.18**

DG act 9.63 < .001 .04

IM .09 .05 1.70 .091 .08

HH –.23 .05 –4.35 < .001 –.20**

DG diff 11.4 < .001 .05

IM .23 .05 4.25 < .001 .19**

HH –.01 .05 –0.27 .786 –.01 Note. N = 461. IM = impression management; HH = honesty-humility; DG hyp = percentage of retained amount in hypothetical dictator game; DG act = percentage of retained amount in actual dictator game; DG diff = difference between DG act and DG hyp; r(X•Y), DG = Semi- partial correlations between the respective DG measure and IM or HH residualized for the respective other (* p < .05, ** p < .001). DG act was assessed following a 6-weeks delay. Presented are the separate models of regressing DG hyp, DG act, and DG diff on IM and HH, respectively. Coefficients are standardized. Study 3:

Müller, S., & Moshagen, M. (2018). Overclaiming shares processes with the hindsight bias.

Personality and Individual Differences, 134, 298–300.

https://doi.org/10.1016/j.paid.2018.06.035

© 2018 Elsevier Ltd. All rights reserved. Reprinted with permission. Personality and Individual Differences 134 (2018) 298–300

Contents lists available at ScienceDirect

Personality and Individual Differences

journal homepage: www.elsevier.com/locate/paid

Short Communication ☆ Overclaiming shares processes with the hindsight bias T ⁎ Sascha Müller , Morten Moshagen

Ulm University, Germany

ARTICLE INFO ABSTRACT

Keywords: Overclaiming is the degree to which individuals claim knowledge about factually nonexistent terms. Whereas Overclaiming overclaiming is often considered to reflect motivational misreporting, we argue that overclaiming involves Self-presentation biased (meta-)cognitive processing, i.e., sense-making and the misattribution of fluency. These processes also Response bias contribute to the hindsight bias: When individuals receive the correct answer to a question, their recalled own Social desirability prior answer tends to be biased towards the correct answer, which is partly due to biased reconstructions of the Hindsight bias original answer. Supporting the idea that overclaiming and the hindsight bias involve similar processes, we show Cognitive bias in the present study (N = 157) that overclaiming is positively related to reconstruction biases in the hindsight bias.

1. Introduction along. A common approach to measure the hindsight bias (the so-called memory design; Pohl, 2007) lets individuals answer knowledge ques- Overclaiming (Paulhus, Harms, Bruce, & Lysy, 2003) is the extent to tions, later presents the correct answers as feedback, and finally asks for which individuals claim to know terms that factually do not exist. It is a recollection of their original judgments (OJ). These recollections ty- assessed by providing individuals with a list of words comprising both pically lie closer to the feedback than to the OJ. existing and invented terms and asking them to indicate whether they Hindsight bias is assumed to occur as result of both motivational know the words. Because this technique provides a deviation of a self- and (meta-)cognitive processes (e.g., Roese & Vohs, 2012). However, report from a known criterion in a socially valued domain, it has in- empirical evidence in favor of motivated misreporting is rather weak itially been conceived as a measure of self-presentation. While some and seems to be limited to non-memory designs or stimuli involving studies supported this contention (e.g., Bing, Kluemper, Davison, actual event outcomes (e.g., Musch, 2003). The hindsight bias as ob- Taylor, & Novicevic, 2011), others questioned the validity as a measure served in memory designs involving knowledge questions is considered of self-presentation (e.g., Feeney & Goffin, 2015). to result from biased cognitive processing (Hawkins & Hastie, 1990), in As a consequence, interpretations of overclaiming reflecting a cog- particular recollection and reconstruction biases. Recollection biases nitive bias emerged. Paulhus (2011) noted that overclaiming includes a occur when the feedback distorts memory of the OJ or renders them memory bias in the form of a feeling of knowing alongside a motiva- inaccessible. Reconstruction biases occur when the feedback alters tional component. This notion is based on the circumstance that the judgment processes involved in reconstructing the OJ upon failure to nonexistent items are designed to appear plausible, resemble the fac- retrieve the OJ from memory. tually existing terms, and are surrounded by potentially familiar ex- Of particular interest in the present context are the notions that istent terms. These factors arguably contribute to increased processing reconstruction biases may result from sense-making and the mis- fluency even for nonexistent terms (Dunlop et al., 2017), which forms attribution of processing fluency (e.g., Roese & Vohs, 2012). When the basis of an illusion of familiarity. Perceived fluent processing of a confronted with the feedback, individuals may automatically attempt to meaningless term could lead to cognitive dissonance, which is resolved integrate this information into their memory and try to engage in a by misattributing the fluency to actual familiarity with the term or by sense-making process with the goal to create a causal explanation triggering a sense-making process that may render nonexistent terms consistent with the new information. This may lead to the selective plausible (given the higher accessibility of consistent information). activation of feedback-compatible information in memory, in turn The hindsight bias is a cognitive bias that occurs when individuals, biasing the reconstruction of one's OJ towards this new informational after learning the outcome of a problem, belief to have known it all basis. In addition, when individuals find it easy to arrive at a causal

☆ Preparation of this manuscript was supported by Grant MO 2830/1-1 from the German Research Foundation (DFG) to the second author. The data and script are available in the Open Science Framework (OSF) at https://osf.io/xbaep/. ⁎ Corresponding author at: Institute of Psychology and Education, Ulm University, Albert-Einstein-Allee 47, 89081 Ulm, Germany. E-mail address: [email protected] (S. Müller). https://doi.org/10.1016/j.paid.2018.06.035 Received 9 April 2018; Received in revised form 18 June 2018; Accepted 19 June 2018 0191-8869/ © 2018 Elsevier Ltd. All rights reserved. S. Müller, M. Moshagen Personality and Individual Differences 134 (2018) 298–300 explanation consistent with the feedback or when the feedback itself is attempt to explain observed behavioral data by a sequence of latent fluently processed (Werth & Strack, 2003), individuals may mis- states that are interpreted as psychological processes (Erdfelder et al., attribute the subjective ease of processing to the judgment itself, re- 2009). The psychological core parameters of the HB13 distinguish re- sulting in a sense of familiarity, and thus a greater hindsight bias. Thus, collection and biased reconstruction. The model assumes that in- the ease with which a reconstruction process is carried out is used to dividuals first try to recollect their OJ with the probabilities of a suc- draw an inference in the judgment process. cessful recall measured by the parameters rC and rE for control and Correspondingly, overclaiming could involve processes similar to experimental items, respectively. Hence, rC and rE reflect the prob- reconstruction biases in the hindsight bias. In an overclaiming task, ability to successfully retrieve one's OJ from memory. Biased recollec- individuals may first attempt to retrieve the terms directly from tion is evident when the probability of a correct recall decreases when memory, which, obviously, must fail for nonexistent terms. However, feedback is presented, and is thus expressed as the difference between given that nonexistent items are intentionally designed to appear rC and rE. When participants fail to retrieve the OJ from memory, the plausible and are embedded in a majority of factually existing and HB13 assumes that individuals attempt to reconstruct their OJ. For potentially familiar terms, individuals may experience elevated pro- experimental items, this reconstruction process can be biased towards cessing fluency, resulting in cognitive dissonance. The feeling of the feedback, as measured by the b parameter. Thus, b reflects the knowing a nonexistent term would hence be a result of a misattribution probability that the reconstruction of one's OJ is biased towards the of fluency or a successful sense-making process initiated with the goal feedback. Hindsight bias is evident if the b parameter is larger than zero to resolve the cognitive dissonance. The present study puts these no- (biased reconstruction), which may be accompanied by a positive dif- tions to a test. A positive association between overclaiming and the ference between rC and rE (biased recollection). hindsight bias was expected, reflecting that these effects share similar underlying cognitive processes. 3. Results

2. Method The parameters of the HB13 were estimated separately for each individual using MPTinR (Singmann & Kellen, 2013). Given the com- 2.1. Participants and procedure paratively sparse data, we restricted all parameters of the HB13 that do not measure cognitive processes to be equal across control and ex- A power analysis indicated that N = 150 are required to detect a perimental items, yielding a submodel involving 9 free parameters (see correlation of .20 with a power of .80 on an alpha-error level of .05. The Erdfelder & Buchner, 1998, p. 395). All ΔBICs (Moshagen, 2010) were sample consisted of 157 participants (79.6% female). Participants were negative, indicating that the constrained model fitted the data well. fl recruited at a medium-sized German university via yers and social Two participants were excluded, because their data led to empirically ff media. Course credit was o ered as compensation. Mean age was underidentified parameters. 22.3 years (SD = 5.98). The study was conducted online. After pro- The b parameter was significantly larger than zero, t(154) = 20.9, fi viding informed consent, participants completed the rst part of the p < .001, indicating an average probability of 54% to bias one's re- hindsight bias task, followed by the overclaiming task, and the second construction of the OJ towards the feedback upon failure to retrieve the part of the hindsight bias task. OJ from memory. In contrast, there was no significant difference be-

tween rC and rE, t(303.2) = 1.42, p = .16, showing that the feedback 2.2. Materials did not affect the probability to recall one's OJ (Table 1). Hence, hindsight bias occurred due to biased reconstruction (b), but not due to 2.2.1. Overclaiming biased recollection (rC − rE). We used 60 terms spanning three domains taken from previous The extent of overclaiming was positively correlated with the studies (e.g., Dunlop et al., 2017). Each domain comprised 12 targets probability of biased reconstruction (r = .20, p = .011). In interpreting “ ” “ (e.g., categorical imperative ) and 8 lures (e.g., paradox of dupli- the magnitude of this effect, it must be kept in mind that correlations ” city ). Item order was random. It was introduced as a general knowl- involving cognitive tasks usually average around .20 (e.g., Teovanović, edge test. Participants were asked whether they knew the term. Signal Knežević, & Stankov, 2015). In contrast, d′ was not associated with detection theory was used to determine the location of the criterion, hindsight bias. c = −.5 ∗ [z(H) + z(FA)], based on the hit (H) and the false alarm (FA) fi “ ” rates. Since c de nes the threshold to respond Yes , we measured 4. Discussion and conclusion overclaiming by OCI =(−1) ∗ c, so that higher values represent more overclaiming (Paulhus et al., 2003). We also computed the index of Overclaiming has been introduced as an indicator of self-favoring discrimination accuracy, d′ = z(H) − z(FA), which is a measure of biases. However, it may not (only) reflect motivated misreporting but knowledge (Paulhus & Harms, 2004). (also) cognitive biases, in turn rendering these scores ambiguous. We argued that overclaiming involves cognitive processes that also underlie 2.2.2. Hindsight bias reconstruction biases in the hindsight bias. Supporting this hypothesis, We used 40 almanac-type items from the set employed by Pohl, Bayen, and Martin (2010), each asking for numeric estimates (e.g., Table 1 “How long is the border between Germany and Denmark?”). The items Descriptive statistics and correlations between parameters. were randomly divided into 20 control and 20 experimental items (counterbalanced across participants). First, participants provided their Variable M SD OCI d′ b OJ to all items. At the end, participants were asked to recall their OJ. 1. Hit rate .72 .11 For experimental items (but not for control items), the correct answer 2. FA rate .25 .16 was provided as feedback. Item order was random. 3. OCI −0.07 0.38 ⁎ For hindsight bias to occur, the recalled judgments of the experi- 4. d′ 1.39 0.56 −.46 ⁎ − mental items should be distorted towards the feedback values. To dis- 5. b .54 .32 .20 .14 6. r − r .03 .17 .02 .01 −.07 sociate recollection and reconstruction biases, we employed the HB13 C E model (Erdfelder & Buchner, 1998), a multinomial processing tree Note. N = 157. OCI = overclaiming index; d′ = discrimination accuracy; ff (MPT) model allowing for the decomposition of the di erent processes b = biased reconstruction; rC − rE = biased recollection. ⁎ involved in the hindsight bias. MPT models are stochastic models that p < .05.

299 S. Müller, M. Moshagen Personality and Individual Differences 134 (2018) 298–300 we showed that overclaiming and reconstruction biases are positively References related. This result can be explained by considering that overclaiming could Bing, M. N., Kluemper, D., Davison, H. K., Taylor, S., & Novicevic, M. (2011). occur when individuals try to integrate the nonexistent but plausibly Overclaiming as a measure of faking. Organizational Behavior and Human Decision Processes, 116(1), 148–162. designed terms into their knowledge upon exposure, leading to the il- Dunlop, P. D., Bourdage, J. S., de Vries, R. E., Hilbig, B. E., Zettler, I., & Ludeke, S. G. lusory familiarity claim. The plausibility of the items arguably makes (2017). Openness to (reporting) experiences that one never had: Overclaiming as an them easy to process, which may be misattributed to actual familiarity outcome of the knowledge accumulated through a proclivity for cognitive and aes- thetic exploration. Journal of Personality and Social Psychology, 113(5), 810–834. or might trigger a sense-making process to resolve cognitive dissonance. Erdfelder, E., Auer, T. S., Hilbig, B. E., Aßfalg, A., Moshagen, M., & Nadarevic, L. (2009). Crucially, sense-making and fluency misattribution are discussed to Multinomial processing tree models: A review of the literature. Zeitschrift für contribute to reconstruction biases in the hindsight bias. In the same Psychologie/Journal of Psychology, 217(3), 108–124. vein as the plausible overclaiming items might be fluently processed, Erdfelder, E., & Buchner, A. (1998). Decomposing the hindsight bias: A multinomial processing tree model for separating recollection and reconstruction in hindsight. processing fluency of the feedback also seems to increase hindsight bias Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(2), 387–414. (e.g., Werth & Strack, 2003), which may lead to reconstruction biases as Feeney, J. R., & Goffin, R. D. (2015). The overclaiming questionnaire: A good way to ff – it facilitates retrieval of feedback consistent information. measure faking? Personality and Individual Di erences, 82, 248 252. Hawkins, S. A., & Hastie, R. (1990). Hindsight: Biased judgments of past events after the Of note, it seems unlikely that the positive association can be at- outcomes are known. Psychological Bulletin, 107(3), 311–327. tributed to self-presentation concerns. In memory designs of the hind- Moshagen, M. (2010). multiTree: A computer program for the analysis of multinomial – sight bias, it is unclear which outcome individuals perceive as desired. processing tree models. Behavior Research Methods, 42(1), 42 54. Musch, J. (2003). Personality differences in hindsight bias. Memory, 11(4–5), 473–489. One may be equally motivated to claim to have known it all along as Paulhus, D. L. (2011). Overclaiming on personality questionnaires. In M. Ziegler, C. well as to reproduce one's OJ as closely as possible. Correspondingly, MacCann, & R. D. Roberts (Eds.). New perspectives on faking in personality assessment hindsight bias in memory designs was shown to be independent of self- (pp. 151–164). New York: Oxford University Press. Paulhus, D. L., & Harms, P. D. (2004). Measuring cognitive ability with the overclaiming presentation and related motivational factors (e.g., Musch, 2003). technique. Intelligence, 32(3), 297–314. Nonetheless, there are alternative explanations of the results. Biased Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming reconstruction may also occur when individuals use the feedback when technique: Measuring self-enhancement independent of ability. Journal of Personality and Social Psychology, 84(4), 890–904. searching their memory for relevant knowledge upon retrieval failure, Pohl, R. F. (2007). Ways to assess hindsight bias. Social Cognition, 25(1), 14–31. which does not necessarily involve misattribution of fluency or Pohl, R. F., Bayen, U. J., & Martin, C. (2010). A multiprocess account of hindsight bias in knowledge integration. Furthermore, hindsight bias only accounted for children. Developmental Psychology, 46(5), 1268–1282. a rather small fraction of the variance in overclaiming, leaving room for Roese, N. J., & Vohs, K. D. (2012). Hindsight bias. Perspectives on Psychological Science, 7(5), 411–426. other processes being involved. Singmann, H., & Kellen, D. (2013). MPTinR: Analysis of multinomial processing tree In conclusion, the present study indicates that overclaiming involves models in R. Behavior Research Methods, 45(2), 560–575. ć ž ć ff biased cognitive processes akin to those underlying reconstruction Teovanovi , P., Kne evi , G., & Stankov, L. (2015). Individual di erences in cognitive biases: Evidence against one-factor theory of rationality. Intelligence, 50,75–86. biases in the hindsight bias. This suggests that elevated scores on Werth, L., & Strack, F. (2003). An inferential approach to the knew-it-all-along phe- overclaiming should not be interpreted as reflecting motivated mis- nomenon. Memory, 11(4–5), 411–419. reporting, as these may also result from automatic cognitive processes unrelated to self-presentation.

300