Construct Validity
Total Page:16
File Type:pdf, Size:1020Kb
Associate Prof. Dr Anne Yee Dr Mahmoud Danaee 1 What does this resemble? 2 Rorschach test • At the end of the test, the tester says … – you need therapy – or you can't work for this company 3 Psychological Testing • Occurs widely … – in personnel selection – in clinical settings – in education What constitutes a good test? 4 Validity and Reliability • Validity: How well does the measure or design do what it purports to do? • Reliability: How consistent or stable is the instrument? –Is the instrument dependable? Validity AKA Criterion Logical Statistical Construc t Face ContenConten Convergent Divergent/ Concurren Predictive tt Discriminant t Reliability Consistency Objectivity Face Validity – Infers that a test is valid by face value – It is clear that the test measures what it is supposed to – As a check on face validity, test/survey items are sent to experts to obtain suggestions for modification. – Because of its vagueness and subjectivity, psychometricians have abandoned this concept for a long time. Content Validity – Infers that the test measures all aspects contributing to the variable of interest – Face validity Vs Content validity: • Face validity can be established by one person • Content validity should be checked by a panel, and thus usually it goes hand in hand with inter- rater reliability (Kappa!) Example: • Computer literacy includes skills in operating system, word processing, spreadsheet, database, graphics, internet, and many others. • It is difficult to administer a test covering all aspects of computing. Therefore, only several tasks are sampled from the universe of computer skills. • A test of computer literacy should be written or reviewed by computer science professors or senior programmers in the IT industry because it is assumed that computer scientists should know what are important in his own discipline. Overall: A logically valid test simply appears to measure the right variable in its entirety? Subjective!!! The Content Validity Index Content validity has been defined as follows: • (1) ‘‘...the degree to which an instrument has an appropriate sample of items for the construct being measured’’ (Polit & Beck, 2004, p. 423); • (2) ‘‘...whether or not the items sampled for inclusion on the tool adequately represent the domain of content addressed by the instrument’’ (Waltz, Strickland, & Lenz, 2005, p. 155); • (3) ‘‘...the extent to which an instrument adequately samples the research domain of interest when attempting to measure phenomena’’ (Wynd, Schmidt, & Schaefer, 2003, p. 509). Two types of CVIs. • content validity of individual items • content validity of the overall scale. • Researchers use I-CVI information to guide them in revising, deleting, or substituting items • I-CVIs tend only to be reported in methodological studies that focus on descriptions of the content validation process • Most often reported in scale development studies is the CVI S-CVI/UA Proportion of items on a scale that achieves a S-CVI relevance rating of 3 or 4 by all the Content Validity of experts the overall scale CVI Degree to which an instrument has an appropriate sample of items for construct S-CVI/Ave being measured Average of the I- I-CVI CVIs for Content Validity of individual items: Are the items representati Are the items Has each item relevance to Are the items in the ve of concepts Question clarity in term comments instruments concepts related to the of wording consistency? related to the dissertation dissertation topic? topic? Q1 ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ Q2 ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ Q3 ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ Q4 ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ Q5 ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ ① ② ③ ④ Ratings 1= not relevant 2 =somewhat relevant. 3= quite relevant 4= highly relevant. I-CVI, item-level content validity index S-CVI, content validity index for the scale Acceptable standard for the S-CVI recommended a minimum S-CVI of .80. If the I-CVI is higher than 79%, the item will be appropriate. If it is between 70% and 79%, it needs revision. If it is less than 70% it is eliminated Kappa statistic is a consensus index of inter-rater agreement that adjusts for chance agreement and is an important supplement to CVI because Kappa provides information about the degree of agreement beyond chance Evaluation criteria for Kappa is the values above 0.74= excellent between 0.60 and 0.74=good between 0.40 and 0.59= fair Validity Criterion Logical Statistical Construc t Face ContenConten Convergent Divergent/ Concurren Predictive tt Discriminant t Reliability Consistency Objectivity Criterion Validity • This type of validity is used to measure the ability of an instrument to predict future outcomes. • Validity is usually determined by comparing two instruments ability to predict a similar outcome with a single variable being measured. • There are two major types of criterion validity predictive or concurrent forms of validity. Criterion validity ‘Warwick spider phobia questionnaire’ positive correlation with SPQ • A test has high criterion validity if – It correlates highly with some external benchmark (concurrent) ? – How well does the test correlated with outcome criteria (predictive)? • Eg – You have lost 30 pounds if your scale reported that you lost 30 pounds, you would expect that your clothes would also feel looser 24 Concurrent Criterion Validity • Concurrent criterion validity is used when the two instruments are used to measure the same event at the same time. • Example: Predictive Criterion Validity • Predictive validity is used when the instrument is administered then time is allowed to pass and is measured against the another outcome. • Example: Criterion validity • When the focus of the test is on criterion validity, we draw an inference from test scores to performance. A high score of a valid test indicates that the test taker has met the performance criteria. • Regression analysis can be applied to establish criterion validity. An independent variable could be used as a predictor variable and a dependent variable, the criterion variable. The correlation coefficient between them is called validity coefficients. How is Criterion Validity Measured? • The statistical measure or correlation coefficient tells the degree to which the instrument is valid based on the measured criteria. • What does it look like in an equation? – The symbol “r” denotes the correlation coefficient. – A higher “r” value shows a positive relationship between the instruments. – A mix of high and low “r” values shows a negative relationship. Predictive Validity Concurrent Validity • As a rule of thumb, for absolute value of r: • 0.00-0.19: very weak • 0.20-0.39: weak • 0.40-0.59: moderate • 0.60-0.79: strong • 0.80-1.00: very strong. Validity AKA Criterion Logical Statistical Construct Face ContenConten Convergent Divergent/ Concurren Predictive tt Discriminant t Reliability Consistency Objectivity Construct validity • Measuring things that are in our theory of a domain. • The construct is sometimes called a latent variable – You can’t directly observe the construct – You can only measure its surface manifestations – it is concerned with abstract and theoretical construct, construct validity is also known as theoretical validity 32 What are Latent Variables? • Most/all variables in the social world are not directly observable. • This makes them ‘latent’ or hypothetical constructs. • We measure latent variables with observable indicators, e.g. questionnaire items. • We can think of the variance of an observable indicator as being partially caused by: – The latent construct in question – Other factors (error) • I cringe when I have to go to math class. • I am uneasy about going to the board in a math class. Math • I am afraid to ask questions in math anxiety class. • I am always worried about being called on in math class. • I understand math now, but I worry that it's going to get really difficult soon. • Specifying formative versus reflective constructs is a critical preliminary step prior to further statistical analysis. Specification follows these guidelines: • Formative – Direction of causality is from measure to construct – No reason to expect the measures are correlated – Indicators are not interchangeable • Reflective – Direction of causality is from construct to measure – Measures expected to be correlated – Indicators are interchangeable – An example of formative versus reflective constructs is given in the figure below. Factor model • A factor model identifies the relationship between observed items and latent factors. For example, when a psychologist wants to study the causal relationships between Math anxiety and job performance, first he/she has to define the constructs “Math anxiety” and “job performance.” To accomplish this step, about the psychologist needs to develop items that measure the defined construct. Construct, dimension, subscale, factor, component • This construct has eight dimensions (e.g. Intelligence has eight aspects) • This scale has eight subscales (e.g. the survey measures different but weakly related things) • The factor structure has eight factors/components (e.g. in factor analysis/PCA) Exploratory Factor Analysis • (EFA) is a statistical approach to determining the correlation among the variables in a dataset. • This type of analysis provides a factor structure (a grouping of variables based on strong correlations). • EFA is good for detecting "misfit" variables. In general, an EFA prepares the variables to be used for cleaner structural equation modeling. An EFA should always be conducted