Psychometrics a RTICLE
Total Page:16
File Type:pdf, Size:1020Kb
O RIGINAL Psychometrics A RTICLE PSICOMETRIA PSICOMETRÍA Luiz Pasquali1 ABSTRACT RESUMO RESUMEN Psychometrics has foundations on the A psicometria fundamenta-se na teoria da La Psicometría se fundamenta en la teoría theory of measurement in Sciences and is medida em ciências para explicar o sentido de la medida en las ciencias buscando ex- aimed at explaining the meaning of re- que têm as respostas dadas pelos sujeitos plicar el sentido en las respuestas de los que sponses provided by subjects submitted to a uma série de tarefas e propor técnicas fueron sujetos a una serie de tareas, ade- a series of tasks, and proposing techniques de medida dos processos mentais. Neste ar- más de proponerse técnicas de medida de for the measurement of mental processes. tigo são apresentados os conceitos e mo- sus procesos mentales. En este artículo son This article presents concepts and models delos da psicometria moderna e discutidos presentados los conceptos y modelos de of modern psychometrics and discusses the os parâmetros de validade e precisão dos psicometría moderna, así como son discu- validity and reliability parameters of the testes. tidos los parámetros de validez y precisión applied tests. de los testes. KEY WORDS DESCRITORES DESCRIPTORES Psychometrics. Psicometria. Psicometría. Reproducibility of results. Reprodutibilidade dos testes. Reproducibilidad de resultados. Validity of tests. Validade dos testes. Validez de las pruebas. Validation studies. Estudos de validação. Estudios de validación. 1 Researcher Professor Associated with the University of Brasilia. Brasilia, DF, Brazil. [email protected] Rev Esc Enferm USP Received: 15/06/2008 PortuguesePsychometrics / English: 2009; 43(Spe):992-9 Approved: 15/12/2008 www.scielo.br/reeuspPasquali L 992 www.ee.usp.br/reeusp/ INTRODUCTION PSYCHOMETRICS: CONCEPT AND MODELS Measurement in psychosocial sciences Modern psychometrics can be traced back to two Psychometrics is etymologically represented as the sources: the classical test theory (CTT), and the item re- theory and technique of measuring mental processes, and sponse theory (IRT). CTT has been axiomatized by is especially applied in the fields of psychology and educa- Gulliksen(3) and IRT was initially elaborated by Lord(4) and tion. It is grounded in the general theory of measurement Rasch(5), and finally axiomatized by Bimbaum(6) and Lord(7). in sciences, or else, in the quantitative method whose ma- jor characteristic is the fact that it represents the knowl- In a general sense, psychometrics attempts to explain edge of nature in a more precise way in comparison with the meaning of responses given by subjects in a series of the application of common language to describe the ob- tasks typically named as items. The CTT is aimed at explain- servation of natural phenomena. ing the total final result, that is, the sum of responses pro- vided to a series of items, expressed by the so-called total Psychometrics historically stems from the psychophys- score (S). For instance, the S in a test of 30 capability items ics of the Germans Ernst Heinrich Weber and Gustav would be the sum of correctly responded items. If the value Fechner. The British Francis Galton also contributed to the of 1 were given to each correct item and 0 to each incor- development of psychometrics by creating tests to mea- rect one, and the subject reached 20 correctly and 10 in- sure mental processes; by the way, he is considered as the correctly responded items, this person’s score S would be creator of psychometrics. However, it was the inventor of 20. The CTT, then, asks itself: what does this total 20 mean the multiple factorial analyses, Leon Louis Thurstone, who to the subject? The IRT, on the other hand, is not inter- enlivened psychometrics, making it different from psy- ested in the test total score; it is specifically aimed at each chophysics. Psychophysics was defined as one of the 30 items and wants to know what the probabil- the measurement of directly observed pro- ity is and what the factors that influence this cesses, or in other words, the organism’s probability are regarding every individual Psychometrics item’s correctness and incorrectness (in ca- stimulus and response, while psychometrics attempts to explain consists in measuring the organism’s behav- pability tests) or acceptance or rejection (in ior by means of mental processes (law of com- the meaning of preference tests: personality, interests, atti- parative judgment). responses given tudes). In such a way, the CTT is interested in by subjects in producing quality tests, while the IRT is fo- Measurement in sciences has raised dia- a series of tasks cused on developing quality tasks (items). At the end, therefore, we have either valid tests tribes among researchers, particularly in the typically named field of social sciences. Nonetheless, the most (CTT) or valid items (IRT), and those results as items. accepted definition among researchers was will build as many valid tests as desired, or given by Stanley Smith Stevens in 1946. He the amount of tests allowed by the items. used to say that to measure meant to assign numbers to Thus, the richness of the psychological or edu- objects and events in accordance with given rules(1). The cational assessment within the IRT’s scope of action con- assignment rules to such numbers are defined by the pro- sists in building store rooms of valid items that evaluate posal of the same author concerning the four measurement latent traits - these store rooms are called item bank, aimed levels or measurement typologies, which are: nominal, or- at elaborating countless numbers of tests. dinal, interval, and ratio. The CTT model was elaborated by Spearman and de- tailed by Gulliksen, as follows: The nominal measurement is the one that applies num- bers to nature phenomena, keeping exclusively the axioms T = TS + E of number identity, that is, the number is employed only as a numeric or graphic symbol. When applying the num- where, ber, the ordinal typology saves the axioms of order, that is T = subject’s total or empirical score, which is the sum to say, the major characteristics of the number, or its mag- of all items achieved by the test; nitude (by definition, a given number is greater or smaller than, not only different from or better than the other ex- TS = true score, which is the real magnitude of what the actly because its value is intrinsically higher or lower than test wants to measure in the subject; that score will be the any other). The other typologies point to axioms of addi- S itself, in case there is no measurement error; tionality. The axiom history was detailed by Whitehead and E = the error of the measurement. Russell between 1910 and 1913, and again in 1965, in their book Principia Mathematica, where they describe the 27 In this way, the empirical score is the sum of the true famous axioms of the mathematical number(2). score and the error; consequently, E = T – TS, and TS = T – E. Psychometrics Rev Esc Enferm USP Pasquali L 2009; 43(Spe):992-9 www.ee.usp.br/reeusp/ 993 Figure 1 shows the relationship among these various elements of the empirical score, where the union between 1,00 the true (TS) and the error (ES) score can be observed; that 0,90 is to say, the subject’s empirical or gross score (T – test re- 0,80 sult known as the Tau score - τ) is comprised of two compo- 0,70 nents: the subject’s real or true score (TS) in what the test 0,60 intends to measure, and the error score (ES) of the mea- P q 0,50 i surement, which is always present in any empirical opera- 0,40 tion. In other words, we are assuming here that as the 0,30 subject’s gross score differs from his true score, it is the 0,20 error that accounts for such a disparity; this difference, then, 0,10 is the error’s concept itself. 0 12345678 TS Capability q Figure 2 - The item’s characteristic curve The IRT is concretely affirming the following: the sub- ject is given a stimulus or a series of stimuli (such as, items of a test) and he/she responds to it/them. From the re- sponses provided by the subject, that is, taking into account the analysis of his/her responses to the specified items, we T ES can deduce on the subject’s latent trait, hypothesizing re- lationships between the subject’s observed responses and the level of his/her latent trait. These relationships can be expressed by means of a mathematical equation that de- scribes the type of function taken on by these relationships. In fact, only a limited number of mathematical models are able to express such relationships, depending on the type of applied mathematical function and/or the number Figure 1 - The true score (TS) components of parameters that one wants to find out for the item. A remarkable advantage IRT has over the classical theory con- cerning the models it uses is that the models employed by Hence, the CTT’s ultimate challenge is to elaborate strat- the IRT allow for disconfirmation. In effect, the demonstra- egies (statistical ones) to either control or evaluate E’s mag- tion of compatibility between the model and the data nitude. Errors are provoked by a wide range of alien factors (model-data goodness-of-fit) is a necessary step towards identified by Campbell and Stanley(8), such as the test’s own this theory’s procedures. Specialized statistical packages are deficiencies, stereotypes and biases of the subject, historical made necessary in order to perform the IRT, as they are factors, and random historical and environmental factors.