A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR

CHILDREN ENTERING PRE-

by

MI-YOUNG WEBB

(Under the Direction of Seock-Ho Kim)

ABSTRACT

The purpose of this study was to determine the psychometric characteristics of phonological awareness assessment in pre-kindergarten children based on Messick’s (1989) framework for unitary construct . Four hundred and fifteen pre- kindergarten children were given eight tasks of phonological awareness drawn from “The Phonological Awareness ” (Robertson & Salter, 1997). The four aspects of , including content, substantive, structural, and external aspects were examined. The item analysis indicated a high internal consistency; however, the levels of item difficulty for each task were fairly difficult for this age group. Factor analysis with varimax rotation revealed that two factors may underlie the phonological awareness measurement. Although the effect size was small, multiple regression analysis indicated a linear combination of two tasks had a statistically significant predictive validity for beginning alphabet sound knowledge in pre-kindergarten.

INDEX WORDS: Validation, Messick’s unitary construct validity, Reading, Phonological awareness, Assessment

A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR

CHILDREN ENTERING PRE-KINDERGARTEN

by

MI-YOUNG WEBB

B.S, The Cheongju University, South Korea, 1997

A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial

Fulfillment of the Requirement for the Degree

MASTER OF ARTS

ATHENS, GEORGIA

2003

© 2003

Mi – young Webb

All Rights Reserved

A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR

CHILDREN ENTERING PRE-KINDERGARTEN

by

MI – YOUNG WEBB

Major Professor: Seock – Ho Kim

Committee: Steve Olejnik Paula Schwanenflugel Electronic Version Approved:

Maureen Grasso Dean of the Graduate School The University of Georgia August 2003 iv

TABLES OF CONTENTS

Page

LIST OF TABLES ...... vi

LIST OF FIGURES ...... viii

CHAPTER

I INTRODUCTION...... 1

Reading and Academic Performance...... 1

The Component of Reading Acquisition...... 1

Overview...... 2

II PHONOLOGICAL AWARENESS ...... 3

Definition of Phonological Awareness ...... 3

The Role of Phonological Awareness in Reading Acquisition...... 4

Developmental Sequence of Phonological Awareness...... 5

Validity Test for Phonological Awareness Tasks...... 7

Purpose of the Study...... 9

III VALIDITY...... 10

Traditional Conception of Validity...... 10

Unified Conception of Validity...... 13

Validity as Integrated Evidence ...... 21

Facets of the Unitary Validity...... 22

IV METHOD ...... 24 v

Participants...... 24

Materials ...... 25

Procedure ...... 31

V RESILTS ...... 35

The Content Aspect of Construct Validity...... 35

The Substantive Aspect of Construct Validity...... 35

The Structural Aspect of Construct Validity...... 37

The External Aspect of Construct Validity...... 41

VI DISCUSSION...... 45

The Content Aspect of Construct Validity...... 45

The Substantive Aspect of Construct Validity...... 48

The Structural Aspect of Construct Validity...... 50

The Generalizability Aspect of Construct Validity...... 55

The External Aspect of Construct Validity...... 56

The Consequential Aspect of Construct Validity...... 59

VII CONCLUSION ...... 61

REFERENCES ...... 63

APPENDIX: PHONOLOGICAL AWARENESS TEST...... 96

vi

LIST OF TABLES

Page

Table 1: The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Preliminary Item Condition...... 69

Table 2: The Maximum Scores, the Means, and the Standard Deviations fro Phonological

Awareness Tasks Based on the Actual Item Condition...... 70

Table 3: Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Preliminary Item Condition...... 71

Table 4: Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Actual Item Condition...... 72

Table 5: The Mean Levels of Task Difficulty of Phonological Awareness ...... 73

Table 6: Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item

Condition...... 74

Table 7: Item Analyses for Rhyming Discrimination Task Based on the Actual Item

Condition...... 75

Table 8: Item Analyses for Syllable Segmentation Task Based on the Preliminary Item

Condition...... 76

Table 9: Item Analyses for Syllable Segmentation Task Based on the Actual Item

Condition...... 77

Table 10: Item Analyses for Initial Isolation Task Based on the Preliminary Item

Condition...... 78 vii

Table 11: Item Analyses for Initial Isolation Task Based on the Actual Item

Condition...... 79

Table 12: Item Analyses for Phoneme Blending Task Based on the Preliminary Item

Condition...... 80

Table 13: Item Analyses for Phoneme Blending Task Based on the Actual Item

Condition...... 81

Table 14: Intercorrelations among the Phonological Awareness Tasks ...... 82

Table 15: Factors, Eigenvalues, Percentage of Variance Accounted for...... 83

Table 16: Factor Loadings for One-Factor Solution...... 84

Table 17: Factor Loadings for Two-Factor Solution after Varimax Rotation...... 85

Table 18: The Means and the Standard Deviations of Alphabet Sound Upper and Lower

Case Knowledge Tests...... 86

Table 19: Predictive Correlations between Phonological Awareness Tasks and Alphabet

Sound Upper and Lower Case Knowledge Tests ...... 87

Table 20: The Means, and the Standard Deviations of Phonological Awareness Tasks by

Gender Groups ...... 88

Table 21: The Means and the Standard Deviations of Phonological Awareness Tasks by

Ethnicity Group...... 89

Table 22: The Means and the Standard Deviations of Phonological Awareness Tasks by

Socioeconomic Group...... 90

viii

LIST OF FIGURES

Page

Figure 1: Developmental Sequence of Phonological Awareness ...... 91

Figure 2: Facets of Unitary Validity...... 92

Figure 3: Plot of Eigenvalues and Factors of Scree Test ...... 93

Figure 4: The Procedure for Assessment Construction and Validation...... 94 1

I. INTRODUCTION

Reading and Academic Performance

Research in early reading acquisition has received considerable attention because children’s early reading skills have a strong and continuous relationship with their later academic performance. Children who learn to read early and well are more likely to become familiarized with print and to increase knowledge domains (Cunningham &

Stanovich, 1997). On the other hand, children who experience difficulties in learning to read at early ages tend to continue their reading difficulties over time regardless of remedial services (Johnston & Allington, 1991) and delay learning in other academic areas which highly depend on their reading skills (Stanovich, 1986; Chall, Jacobs, &

Baldwin, 1990; Stevenson & Newman, 1986).

The Component of Reading Acquisition

No single factor determines the emergence of literacy because reading development involves complex cognitive levels and multiple activities. Some studies indicated positive and longitudinal correlations between oral language skills and reading

(Bishop & Adams, 1990). Other research suggests vocabulary skills significantly influence learning to read (Wagner, Torgesen, Rashotte, Hecht, Barker, Burgess,

Donahue, & Garon, 1997). Whitehurst and Lonigan (1998) proposed three different components of emergence of literacy named oral language skills, phonological processing abilities, and print knowledge. Lonigan, Burgess, and Anthony (2000) found that phonological sensitivity and letter knowledge explained 54 % of the variation in 2 children’s decoding skills. Regardless of different research suggestions on the components of emergence of literacy, a substantial amount of research has revealed a significant and continual relationship between phonological awareness and the acquisition of early reading and spelling (Bradley & Bryant, 1983; Goswami & Bryant,

1990). Much research has suggested that children’s implicit understanding of and ability to manipulate the sound system of language, which is known as phonological awareness, is a crucial precursor to the emergence of early literacy. Because of an important role of phonological awareness in young children, a considerable amount of research has tried to operationalize the concept of phonological awareness.

Overview

This study investigates measures of phonological awareness for pre-kindergarten children in terms of their psychometric characteristics. This study will focus on how framework for unitary construct validity suggested by Messick (1989) can be implemented in practice. Before the validation study, previous research on phonological awareness, including the relationship between phonological awareness and the early reading acquisition, developmental sequence of phonological awareness, and the validity study of phonological awareness, will be briefly reviewed in the next section.

3

II. PHONOLOGICAL AWARENESS

Definition of Phonological Awareness

Because phonological awareness involves understanding that words can be divided into segments of sound smaller than a syllable and learning about individual phonemes, one must know what a phoneme is in order to understand the concept of phonological awareness (Torgesen & Mathes, 2000). A phoneme is the smallest unit of sound system in a language which makes a difference in meaning. Phonemic awareness

– a subset of phonological awareness – refers to the awareness that spoken language consists of a sequence of phonemes (Yopp & Yopp, 2000).

Broadly speaking, phonological awareness refers to the sensitivity to or explicit awareness of and the ability to manipulate the sound units in spoken language. Thus, phonological awareness includes the ability to generate and recognize rhyming words, to count syllables, to segment a word into phonemes, to separate the beginning of a word from its ending. Beginning readers should understand the fundamental principle that speech can be segmented and these sound units can be represented by printed forms

(Liberman, Shankweiler, Fischer, & Carter, 1974). Without phonological awareness young children have difficulty in understanding how alphabetic transcription works, and consequently, their ability to learn to read is hindered (Torgesen, 1999; Blachman, 1994;

Liberman, Shankweiler, & Liberman, 1989).

4

The Role of Phonological Awareness in Reading Acquisition

Overwhelming evidence from a variety of populations and tasks has indicated a strong and specific relationship between phonological awareness and early acquisition of reading and spelling (Adams, 1990; Bradley & Bryant, 1983; Bryant, MacLean, &

Bradley, 1990; Goswami, & Bryant, 1990; Stanovich, 1992; Wagner & Torgesen, 1987).

Children who have better abilities in analyzing and manipulating rhymes, syllables, and phonemes are better at learning to read than children who have difficulties in acquiring these skills. The relationship between phonological awareness and early reading acquisition is present even after such factors as intelligence, vocabulary skills, and listening comprehension are partialled out (Bryant, MacLean, Bradley, & Crossland,

1990; Stanovich, 1992; Wagner & Torgesen, 1987).

Some researchers have explained that the complex relationship between the sounds of speech and the signs of print makes it difficult for young readers to perceive the phonemic segments in speech (Liberman, 1978; Torgesen & Mathes, 2000). For example, three segments of the written word lag overlap with one another (coarticulating) and create a single sound in speech production. Coarticulating the phonemes in words makes it difficult for beginning readers to identify phonemes as unique parts of speech.

Also, letters and phonemes do not always correspond to each other consistently, which means graphic symbols more or less represent the sounds of speech in different words

(Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967).

Torgesen and Mathes (2000) expound that phonological awareness is not the only determinant of the early acquisition of reading but it is a critical precursor to effective reading skills. Phonological awareness promotes children’s understanding of the 5 relationship between speech and alphabetic orthography. Children must understand that speech is comprised of sound segments at the level of phonemes in order to read the words in print (Blachman, 1994; Liberman, Shankweiler, & Liberman, 1989; Yopp &

Yopp, 2000). Also, phonological awareness helps children perceive the categories of common sounds that are represented by common letters. The ability to observe the correspondence between letters and sounds in words reinforces children’s knowledge of common spelling patterns and accurate recognition of whole words that come up in print repeatedly (Bryant, MacLean, & Bradley, 1990; Goswami, 1986, 1988; Torgesen &

Mathes, 2000). Finally, phonological awareness enables children to produce possible words in context from the partially sounded out words by elaborating similar phonemes in words. Indeed, children who are quick to develop the ability to analyze and to construct a connection between sound segments and letters almost invariably become better readers than children who have difficulties in developing these skills (Share &

Stanovich, 1995).

Developmental Sequence of Phonological Awareness

Numerous studies and intervention which used various tasks of phonological awareness have found that, regardless of task requirements, phonological awareness tasks account for a large portion of common variance of construct that underlies the measurement. In addition, these studies have demonstrated the different developmental levels of task difficulty (Adams, 1990; Stahl & Murray, 1994; Stanovich, Cunningham, &

Cramer, 1984; Yopp, 1988). Understanding the developmental sequence of phonological awareness is important because it is directly related to the issues of validity of 6 assessment. Different tasks involve different levels of cognitive and linguistic abilities or age-appropriateness, thus the child’s assessed levels of phonological awareness might be greatly determined by the complexity of the tasks (Backman, 1983; Burt, Holm, & Dodd,

1999).

Generally, the ability to analyze larger units (rhyme and syllable) is developed prior to the ability to analyze smaller units (phoneme). Hoien, Lundberg, Stanovich, and

Bjaalid (1995) outlined that sensitivity to rhyme is thought to be the beginning of the developmental continuum of phonological awareness, phoneme segmentation to be the end of the continuum, and syllable segmentation might be the intermediate level of the continuum. Children as young as 3 years of age show sensitivity to rhyme, which is a more global aspect of sound structure of words (Lonigan, Burgess, Anthony, & Barker,

1998; MacLean, Bryant, & Bradley, 1987). Children’s knowledge of nursery rhymes at age 3 is significantly related to the measure of rhyme detection a year later (MacLean et al., 1987), and early sensitivity to rhyme and alliteration predicts later awareness of phonemes which plays an important role in reading development (Bryant et al., 1990).

There is a ceiling effect on rhyme detection and production tasks at the kindergarten level, and most children are able to blend and segment words into the syllabic unit.

Nonetheless, they cannot segment the words into a series of phonemes at this age level

(Blachman, 1994; Stanovich et al., 1984; Yopp, 1988). By the end of first grade, the majority of children can manipulate phonemes. They can add, delete, or move phonemes and generate words. More specific developmental processes of phonological awareness can be found in Figure 1 (cf. Hill, 1999; Torgesen & Mathes, 2000).

7

Validity Test for Phonological Awareness Tasks

As discussed earlier, a great amount of research using various measures has focused on the concept of phonological awareness and has found convergence evidence that performance on phonological awareness tasks are intercorrelated with one another.

Furthermore, regardless of the measures that have been used, phonological awareness tasks shared a large portion of total variance, which in turn, provide evidence for construct validity of phonological awareness (Hoien et al., 1995; Stanovich et al., 1984;

Yopp, 1988). Two examples of test validity for phonological awareness tasks are briefly discussed in this section.

Yopp (1988) administered 10 commonly used phonological awareness tasks, including; rhyming task, auditory discrimination, phoneme blending, phoneme counting, phoneme deletion, phoneme segmentation, sound isolation, and word-to-word matching task, to 96 kindergarten children with an average age of 5 years, 10 months. She found that the phoneme deletion was the most difficult task, and the rhyming was the easiest task. She conducted a principal factor analysis with oblique rotation and found that the first factor accounted for 58.7 % of the variance and the second factor accounted for an additional 9.5 % of the variance. In addition, phoneme blending, phoneme counting, phoneme segmentation, and sound isolation all loaded highly on the first factor and the two phoneme deletion tasks loaded highly on the second factor. She labeled the first factor as “Simple Phonemic Awareness”, and the second factor as “Compound Phonemic

Awareness”. A stepwise regression analysis was also conducted, with the score on the learning rate test as the dependent variable and 10 tests of phonological awareness as 8 predictors. The sound isolation task explained 52 % of the variance, and phoneme deletion task explained 10 % of the variance in the learning rate test.

Hoien, Lundberg, Stanovich, and Bjaalid (1995) utilized a very large sample size to examine the differential validity of the different levels of phonological awareness. Six types of phonological awareness tasks including rhyme recognition, syllable counting, phoneme counting, initial phoneme matching, initial phoneme deletion, and phoneme blending were administered to 128 Norwegian children. The average age of the children was 6 years, 11 months. A principal factor analysis using varimax rotation revealed a three-factor solution. Initial phoneme matching, initial phoneme deletion, phoneme blending, and phoneme counting were found highly loaded on the first factor which accounted for 38.6 % of the variance. Syllable counting loaded highest on the second factor, and rhyme recognition loaded highest on the third factor. The second and the third factors accounted for 18.4 % and 17.6 % of the variance respectively. Hoien et al. (1995) concluded that the study results indicated preschool children without any formal reading instruction and with very limited reading skills showed phonemic awareness.

9

Purpose of the Study

The studies of Yopp (1988) and Hoien et al. (1995) used large sample sizes and included a variety of tasks to systematically investigate the concept of phonological awareness of 5 to 6 years-old children. Similarly, most of studies relating to phonological awareness have assessed preliterate children at the school entry, prior to formal reading instruction. Compared with this aspect, there has been much less research focused on the development of phonological awareness at the preschool age level, specifically at age of four; nevertheless, the considerable evidence has indicated that preschool children as young as the age of 3 show implicit knowledge of phonological awareness (Bryant et al., 1990; MacLean et al., 1987).

The purpose of this study was to conduct a validity study regarding the off-level use of The Phonological Awareness Test (Robertson & Salter, 1997) for identifying phonological awareness in preliterate pre-kindergarten children using Messick’s (1989) framework for unitary construct validity. Because validity is the most important consideration in a test development and use, traditional view of validity and six aspects of the unitary concept of validity proposed by Messick (1989, 1995) are briefly reviewed prior to the validation process for phonological awareness tasks.

10

III. VALIDITY

Validity is “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (AERA, APA, & NCME, 1999, p. 9).

Accordingly, validation is the most crucial procedure in test development and use because it is a process of collecting evidence to support the intended interpretation of test scores and implications of the score meaning.

Traditional Conception of Validity

The conception of validity has gradually shifted from numerous specific to a few distinct validity types and finally to a unitary validity concept (Messick,

1989). Although there has been increasing emphasis on construct validity as a unitary conception of validity, three or four different types of validity have been commonly utilized in various assessment settings since the early 1950s. The traditional view of validity argues that the types or aspects of validity depend on the inferences to be drawn from the test scores and the implications of entailed test interpretations. These separate types of validity and the limitation of the traditional conception of validity are briefly discussed.

Content Validity

Content validity refers to the degree to which the content of test samples represents the content of a particular behavioral domain of interest. Content validity is primarily concerned with adequate sampling of the content of the domain. The knowledge and skills that are measured by the test items should be representative to the 11 larger domain of knowledge and skills. The other aspect of content validity involves the format of the test such as clarity of questions or directions and appropriateness of language. Content validity is evaluated based on the professional judgment about the domain relevance and representativeness of the content according to specific criteria or objectives. Based upon the agreement in judgments by a panel of content experts, test developers revise or select the final items. Hence, content validity is to specify the universe of item content and item – selection procedures (Messick, 1989).

Content validity is important because it accumulates judgmental evidence to support the domain relevance and representativeness of the test content which act upon the nature of score inferences supported by other evidence. However, Messick (1989) argues that using content validity as the solitary validity evidence has a critical limitation.

Content validity does not take into consideration the response processes, the internal and external structure of the test, or performance differences; thus, it does not provide enough evidence supporting inferences to be made from the test scores.

Criterion-related Validity

Criterion-related validity is the degree to which the test scores are systematically associated with one or more external criteria considered to directly measure the same variable. There are two aspects of criterion-related validity – predictive and concurrent criterion-related validity. Predictive validity refers to the extent to which the test scores predict the future performance on the criteria, and concurrent validity indicates the extent to which the test scores estimate the present performance on the criteria. Therefore, criterion-related validity is a matter of how the test scores accurately predict criteria performance. Criterion-related validity is evaluated based on the level of empirical 12 relationship, commonly estimated by correlations or regressions, between the test scores and criteria scores. For this reason, determining appropriate criteria is a critical step in criterion-related validation.

Criterion-related validity is not about the pattern of relationships between test scores and other measures, but rather it is about prediction which is more concerned with non-causal dependence. Furthermore, criterion-related validity relies very heavily on the empirical relationships with selected external measures. For this reason, criterion-related validity may be too narrow to reflect the definition of validity because it does not consider any other sources of evidence besides specific test – criteria relationships

(Messick, 1989).

Construct Validity

Construct validity refers to the extent to which test scores support the presence of the psychological construct that underlies the measurement. In this manner, construct validity is concerned with abstract and theoretical traits such as self-esteem, motivation, temperament, and creativity. Construct validation begins with the operational definition of the construct based on the literature reviews and theoretical reasoning. The process of operationalizing the concept is similar to the process of content validation. Operational definition is a process of defining the theoretical terms and specifying the hypotheses for the legitimate experimental procedures for applying a theory (Messick, 1989). After the construct is operationally defined, the hypotheses – the relationships between the measures of concepts – are logically and empirically examined. In this process, it is crucial to evaluate the test items for bias or construct-irrelevant variance which 13 systematically influence the test scores. Finally, empirical evidence is interpreted whether it is consistent with the hypotheses or rival theories.

Construct validity can be assessed by internal and external test structures, that examine the pattern of relationships among item scores or between test scores and other measures. Construct validity also involves study of performance differences over time, across groups, and different settings in response to experimental treatment. On that ground, construct validity is an integration of any evidence to support the meaning of the test scores (Messick, 1989).

Unified Conception of Validity

Traditional distinct types of validity – content, criterion-related, and construct validity – have been widely utilized in various assessment settings. However, it is common that inferences, to be drawn from the test scores, require multiple types of validation approach rather than just one (e.g., Cronbach & Meehl, 1955). Moreover, content validity as sole validity evidence is insufficient because it does not reflect on the internal and external test structures and response processes. Thus, it does not provide evidence that bears on inferences to be made from the test scores. Likewise, criterion- related validity strictly depends on the specific test – criterion relationships and does not consider any other sorts of evidence. On that account, Messick (1995) argues that the traditional conception of validity is fragmented and incomplete because it fails to take into consideration the evidence for the actual and potential consequences of score interpretation and use. In addition, he addresses that the types of validity are not alternatives but supplements of one another because all of these forms of evidence 14 fundamentally support the interpretation and implication of the test scores. Hence, the relation between the evidence and the inferences should determine the validation approach focus rather than a type of validity (Messick, 1989). This is why validity is identified as a unitary concept.

In Messick’s (1989, 1995) view, construct validity incorporates content relevance and representativeness as well as criterion-relatedness since information about the domain content relevance and about the specific criterion-relationships predicted by the test scores clearly influences score interpretation. Therefore, construct validity comprises almost all aspects of validity evidence. A unitary conception of validity should intermix considerations of content, criteria, and consequences into a construct framework to empirically test the rational hypotheses about the interpretation and utility of the test scores (Messick, 1989, 1995).

Messick’s new unified concept of validity heavily emphasizes on both score meaning and social values in test interpretation and use. Messick (1989, 1995) suggests six distinguishable aspects of construct validity to address the multiple and interrelated validity questions to justify score interpretation and use. There are content, substantive, structural, generalizability, external, and consequential aspects of construct validity.

Descriptions of these six aspects are outlined to guide the validation of phonological awareness tasks.

The Content Aspect of Construct Validity

Test content refers to the “themes, wording, and formats of the items, tasks, or questions on a test as well as guidelines for procedures regarding administration and scoring” (AERA et al., 1999, p. 11). Hence, the content aspect of construct validity 15 subsumes theoretical and empirical analyses of adequacy of content relevance, representativeness, and technical quality (Messick, 1989, 1995). This validation process is to gather the construct-relevant sources of task difficulty and to guide the rational development and scoring of performance tasks and other assessment formats.

The sources of invalidity are worth addressing because they can occur mostly during the theoretical and empirical domain of construct – the content aspect of validation (Benson, 1998). According to Messick (1989, 1995), one of the threats to validity is known as “Construct Underrepresentation”. Construct underrepresentation occurs when the assessment is defined too narrowly and fails to adequately cover the important theoretical domain of construct. Another threat to validity is “Construct-

Irrelevancy”. Construct-irrelevant variance is when the assessment is defined too broadly and contains excess reliable variance associated with other distinct construct in addition to the focal construct. That is, aspects of the task are extraneous to the focal construct and make the task irrelevantly difficult or easy for particular individuals or groups.

In essence, evidence about content is primarily concerned with the basis for specifying the boundaries and structure of the construct to be assessed. The construct and test content domain are carefully evaluated by a panel of experts’ professional judgments and documentation of which addresses the potential sources of irrelevant difficulty or easiness that require further analysis as well as sample domain processes in terms of their functional importance (AERA et al., 1999; Messick, 1989, 1995).

On that ground, one needs to consider the definition of phonological awareness – the sensitivity to or awareness of, and the ability to manipulate the sound units in spoken language. Then, phonological awareness tasks should be designed to assess individual’s 16 awareness of and ability to manipulate the spoken language segments which make up words. Regarding the sources of invalidity, understanding the developmental sequence of phonological awareness in children is important because the difficulty and complexity of the tasks directly influence children’s performances. The age of subjects and demographic characteristics should be addressed in this validation step.

The Substantive Aspect of Construct Validity

The substantive aspect of construct validity requires engagement between judged content relevance and representativeness and empirical response consistency or performance regularity in the assessment tasks (Loevinger, 1957; Messick, 1989).

Theoretical and empirical analyses of response processes provide evidence for appropriate sampling of domain and accrue empirical evidence for sampled processes that are actually engaged by respondents in task performance.

Inferences about processes involved in performance are generally developed by analyzing individual responses such as eye movements, response times, performance strategies, or responses to particular items. Empirical evidence of response consistency also derives from correlation patterns among parts of the test and between the test and other variables or from consistency in response times for task segments (AERA et al.,

1999; Messick, 1995). In addition to evaluating the response in tasks, the scoring rubrics or scoring guidelines should be carefully reviewed for the appropriateness of scoring processes to the intended interpretation or construct definition.

In brief, the matter of test content entails not only the content representativeness of the construct measure but also the process representation of the construct and the degree to which these processes are reflective of construct measurement. The content 17 representativeness of the test items need to be assessed in terms of the empirical domain structure which underlies the ultimate test form and score interpretation (Messick, 1989,

1995). Therefore, the scoring and recording response process should be clearly indicated.

The Structural Aspect of Construct Validity

The structural aspect of validity refers to “the extent to which structural relations between test items parallel the structural relations of other manifestations of the trait being measured” (Loevinger, 1957, p. 661). The analyses of internal structure of a test are to determine the degree of the relationships among test items and the intended structure of the theoretical domain. Thus, the structural aspect of construct validity examines the consistency or fidelity of the scoring structure related to the structure of the construct domain.

The structural aspect can be assessed by various statistical methods such as intercorrelation among the items and subscales, exploratory and confirmatory factor analysis, and item response theory. The specific types of analysis and interpretations of the results rely on the implication and utility of the test scores (AERA et al., 1999). For instance, if a set of test items of increasing difficulty is of interest, empirical analyses of the number of items answered correctly or the pattern of scoring key should be provided.

The structural aspect of validity also includes the appropriateness and adequacy of scaling and equating procedures using item response theory. The adequacy of scaling is the degree to which the relative weights for different types of items are consistent with the construct interpretation of the test results (Miller & Linn, 2000).

Indeed, the structural component of construct validity includes both the selection or construction of relevant assessment tasks and the logical development of construct – 18 based scoring criteria, guidelines, and rubrics. The internal structure of the assessment including intercorrelation among the items and subtest, degree of homogeneity in the test, and the dimensionality of the interitem structure should be consistent and reflect the internal structure of the construct domain (Messick, 1989, 1995). In this aspect, item analyses including item difficulty, item discrimination, internal consistency, and factor analysis should be reviewed in addition to the scoring guidelines or the procedure of scoring on the phonological awareness tasks.

The Generalizability Aspect of Construct Validity

Generalizability is concerned with the numerous factors such as sampling fluctuations and reliability of measures that contribute to systematic variability in behavior and performance. Generalizability refers to the degree to which a construct interpretation empirically generalize to and across population groups (population generalizability), situations or settings (ecological generalizability), time periods

(temporal generalizability), and task domains (task generalizability) (Messick, 1989).

For example, ecological generalizability involves the sources of invalidity from the standardization of test materials and administration conditions. As another example, population generalizability examines the test scores across random samples of diverse ethnic groups in order to indicate that the test measures the same construct in these populations. In addition, the limits of score meaning are also influenced by the degree of generalizability across observers or raters of the task performances.

The degree of generalizability of construct meaning across contexts can be evaluated by assessing the degree to which test scores reflect comparable patterns of relationships with other measures or similar responsiveness to treatment across groups, 19 situations, times, and tasks (Messick, 1989). Also, generalizability theory is the application of analysis of variance models and random variance components to estimate universe score variance which examines the consistency of the assessment procedures under different conditions of population groups or tasks (Miller & Linn, 2000). The generalizability aspect of validity evidence is determined by the degree of correlation of the assessment tasks with other tasks representing the construct, by the nature of the construct assessed, and by the scope of its theoretical applicability (Messick, 1989, 1995).

In summary, generalizability is primarily concerned with sources of measurement error associated with the sampling of tasks, occasions, and raters which underlie traditional reliability. The generalizability study presents an evidential basis for judgments of the test interpretation and use across various contexts.

The External Aspect of Construct Validity

The external aspect refers to the degree to which the relationships of test scores with other measures and non- assessment behaviors or performances reflect the expected relations in the theory of construct being assessed (Loevinger, 1957). Indeed, “the construct represented in the assessment should rationally account for the external pattern of correlations” (Messick, 1995, p. 746).

The external component of validity evidence fundamentally depends on the correlations between the total score of assessment and any subscores. Accordingly, the external aspect can be established by the theoretical bases for the obtained patterns and by structural equation models to reproduce the observed correlations in construct – consistency. According to Benson (1998), multitrait-multimethod matrix procedure connects the structural and external stages of validation. The multitrait-multimethod 20 matrix generates two important correlation patterns. One is the “convergent validity coefficient”, which indicates the relationships between the test scores and other measures of the same construct on theoretical grounds. Another correlation pattern is the

“discriminant validity coefficient” that specifies the relationships between the test scores and measures of distinct constructs (AERA et al., 1999; Benson, 1998; Messick, 1995).

In addition, group differentiation also can be relevant if the theoretical construct suggests the presence or absence of the group differences in the proposed test interpretation.

Contrasting the mean scores of gender, diverse ethnic groups, and socio-economic status are examples of this approach.

In short, the meaning of the test scores is verified externally by assessing the degree to which the relevance of the potential relationships with other criterion measure in the stage of external aspect of validation. The test validation in essence is to insure that empirical evidence of such relations attest to the scores for the applied purpose.

The Consequential Aspect of Construct Validity

The consequential aspect appraises the intended and unintended consequences of test uses and implications of score interpretation. AERA et al. (1999) addresses the distinction between validity evidence about consequences and issues of social policy. If consequences of assessment are traced to any sources of invalidity such as construct underrepresentation or construct-irrelevant variance, it is directly related to validity.

Hence, consequences as validity evidence affect or change the score interpretations and implications of score meaning (Miller & Linn, 2000).

Consequences of assessment are either intended or unintended. Intended consequences include improved instructional or educational practices, a test used in 21 placement decisions, or selections of effective treatment in therapy. On the other hand, unintended or adverse consequences include bias in the assessment, unfairness of assessment, and misinterpretations for certain individuals or groups. Fundamentally, the measurement is concerned with any negative implication on individuals or groups that are derived from any sources of invalidity. For example, low scores should not occur because the test measures unrelated knowledge or skills of domain construct. Also, low scores should not occur because the assessment contains something sensitive to particular individuals or groups unintended to be part of the construct.

It is clear that the consequential aspect of validity evidence comprises the value implication of score interpretations as a basis for actions in addition to actual and potential consequences of test use (Messick, 1995). Since consequences as a source of evidence for validity affect the inferences and use of the assessment, the value implications of score interpretations should be addressed as a part of validity framework

(Messick, 1989; Miller & Linn, 2000).

Validity as Integrated Evidence

The six aspects of construct validity are emphasized as a unified concept that addresses score-based interpretations, utility of scores, and value implications as a basis for action. Validity rationale eventually accumulates various sources of evidence to provide a sound scientific basis for the intended interpretation of test score for specific use (AERA et al., 1999). Thus, integrating various components of evidence involves appropriate sampling of domain, relevant assessment task construction procedures, 22 adequate score reliability, proper test administration and scoring procedures, accurate score scaling and equating, standard setting, and careful attention to test invalidity.

These aspects of validity should be viewed as interdependent and complementary forms of validity evidence rather than distinct and substitutable validity types. Indeed, evidence relevant to all of the six aspects need to be integrated into an overall validity judgment to support score-based interpretations and action implications. Once again, the unified concept of validity brings considerations of content, criteria, and consequences together into a construct framework for testing rational hypotheses about theoretical and score-based inferences (Messick, 1989).

Facets of the Unitary Validity

The unified concept of validity is highlighted because it integrates the appropriateness, meaningfulness, and usefulness of score-based inferences. Messick

(1989, 1995) suggests two interconnected facets of the unitary validity concept as a way of cutting and combining validity evidence. The facets of validity enables the prevention of excessive reliance on selected forms of evidence and emphasizes the supplementary role of content- and criterion-related inferences to applied decisions and actions based on the test scores.

The sources of justification of the testing (evidence or consequence) and the function or outcome of the testing (interpretation or use) generates a four-fold classification as presented in Figure 2. The evidential basis of test interpretation is construct validity because construct validity means evidence and rationales support the score meaning. The evidential basis of test use is also construct validity because it 23 involves the score meaning. Also, the evidential basis of test use is supported by evidence for the relevance and utility of the test to the specific applied purpose and setting. The consequential basis of test interpretation is the evaluation of value implications of score meaning and is construct validity since the score interpretation is necessary to assess the value implications. Finally, the consequential basis of test use is the evaluation of both actual and potential social consequences of applied testing. The social consequences also involve evidence of score meaning, of relevance, and of utility.

24

IV. METHOD

This study utilized data obtained an on-going study by Hamilton, Schwanenflugel,

Neuharth – Pritchett, and Restrepo in pre-kindergarten literacy development. The descriptions presented here were based on the information provided by these original investigators.

Participants

A total of 415 pre-kindergarten children (213 boys and 202 girls) participated in the study. The initial investigators recruited participants at the pre-kindergarten registration in spring of 2002 in three Northeastern Georgia school districts. Children were attending 26 public elementary schools in these three school districts. The age of children ranged from 4 years to 5 years, 7 months with an average age of 4 years, 6 months at the time of the school started in the month of August of the year of 2002. The ethnic population was diverse; 41.7 % (n = 173) were African-American, 33.4 % (n =

139) were Caucasian, 18.5 % (n = 77) were Hispanic, 5 % (n = 21) were Asian, and 1.4

% (n = 6) were Bi-Racial. 75.8 % (n = 314) of the children spoke English as a first language, 20.4 % (n = 85) spoke Spanish, and 3.8 % (n = 16) spoke other than English as a first language. Children were predominantly drawn from a low to lower-middle socio- economic class population. 32.9 % (n = 137) of children were reported receiving free or reduced lunch, and 71% (n = 295) of children’s family were reported earning less than

$25,000 per year. 25

The majority of children in this age population did not have any detectable letter knowledge prior to pre-kindergarten level. None of the children were acquired any reading skills at the pre-kindergarten age level for the given tasks.

Materials

Phonological Awareness Tasks

A subset of The Phonological Awareness Test (Robertson & Salter, 1997) was used to assess phonological awareness of pre-kindergarten children in this study. The

Phonological Awareness Test was designed to diagnose deficits in phonological processing and phoneme-grapheme correspondence. The intended population of The

Phonological Awareness Test is five through nine years of age. The Phonological

Awareness Test included rhyming, segmentation, isolation, deletion, substitution, blending, graphemes, and decoding subtests.

Eight phonological awareness tasks were drawn from The Phonological

Awareness Test by the initial investigators, Schwaneneflugel and Blake. The initial investigators included the tasks that were considered to be potentially significant predictors of reading ability in the previous studies and the tasks that to be included in the intervention. The rhyming discrimination, sentence segmentation, syllable segmentation, initial isolation, syllable blending, phoneme blending, consonant graphemes, and long and short vowels graphemes were included to assess the child’s phonological awareness in this study. However, instructions were modified slightly and ceiling rules were created because of the age of the participants. Each of the tasks is described in detail as follows.

The actual tasks items and correct responses are presented in Appendix. 26

Rhyming Discrimination: The rhyming discrimination task was to measure the child’s ability to identify rhyming words presented in pairs. The examiner said to the child, “I am going to say two words and ask you if they rhyme. Listen carefully. Do these words rhyme? Fan – man.” Then the child should respond with either “yes” or

“no”. The examiner indicated whether each response was correct or incorrect, and provided the correct response, “Fan – man. Yes, they do rhyme.” If the child responded with other than “yes” or “no”, the examiner repeated the question to elicit a “yes” or “no” response. The stimulus phrase, “Do these words rhyme?” could be repeated, but no other prompts were given to the examinees. The actual ten task items were administered to the child who responded correctly to at least one of the three practice items. Thus, the child who responded to all three practice items incorrectly was excluded from the task administration. Practice items included “Fan – man”, “Fan – tan”, and “Fan – dog”.

Only words that the child responded correctly on their own were scored as correct, with a possible score range of 0 to 10, excluding the three practice items. The examiner stopped administering the task if there were three consecutive wrong items in the child’s responses.

Sentence Segmentation: The purpose of sentence segmentation task was to assess the child’s ability to divide sentence into their constituent words. The examiner told the child, “I am going to say a sentence, and I want you to clap one time for each word I say. My house is big. Now, clap it with me.” The examiner said the sentence again and clapped as she/he said each word. “My – house – is – big. Now, you try it by yourself. My house is big.” The child should respond with clapping four times, while she/he repeated the sentence word by word. The examiner indicated whether the child’s 27 response was correct or incorrect. If the child responded incorrectly, the examiner repeated the sentence and asked the child to clap with her/him. The stimulus phrase,

“Clap one time for each word I say.” was given to the examinees without any other prompts. Three practice items, including “My – house – is – big.”, “My – name – is -

______.", and "I – like – dogs.” were given prior to the actual task items. However, the task administration took considerably long time for this age population. The initial investigators decided that this task was too long for the concentration level at this age population. Sentence segmentation task was dropped from the battery after the task was administered to about 50 pre-kindergarten children.

Syllable Segmentation: The purpose of the syllable segmentation task was to assess the child’s ability to divide the words into syllables. The examiner told the child,

“I am going to say a word, and I want you to clap one time for each word part or syllable

I say. Saturday. Now, clap it with me.” The examiner said the word again and clapped once as she/he said each syllable. “Sat – ur – day. Now, try it by yourself.” The words, including “Saturday”, “Friday”, and “Dog” were given as practice items. The child should respond with claps, one for each syllable as the child said the word by syllable.

The examiner acknowledged a correct response. If the child responded incorrectly, the examiner repeated the word and asked the child to clap with her/him. The stimulus phrase, “Clap one time for each syllable in the word.” was repeated, but no other prompts were given to the child. After three practice trials, the actual task items were administered to the child who responded to at least one of the three practice items correctly; hence, the child was excluded from the task administration if he/she responded to all three practice items incorrectly. Only words that the child responded to correctly 28 on their own were scored as correct. The examiner stopped the task administration if the child responded to three consecutive items incorrectly. The child’s score was the number of correct responses, with a possible score range of 0 to 10, excluding the three practice items.

Initial Isolation: The initial isolation task was to measure the child’s ability to identify the initial phoneme in a word. The examiner began the task by saying, “I am going to say a word, and I want you to tell me the beginning or first sound in the word.

What is the beginning sound in the word CAT?” The child should respond with /k/ or

“kuh”. The examiner gave feedback by saying, “That is correct.” or by saying, “The beginning sound in CAT is /k/.” The stimulus phrase, “What is the beginning sound in

______.”, was given to the child. The examiner emphasized the word “sound” if the child gave letter names; however, she/he scored the item incorrect and did not repeated the item. After the three practice trials, including “CAT”, “MAD”, and “JANE”, the examiner administered the actual task items to the child who correctly responded to at least one of the three practice items. The items that the child responded to correctly on their own were scored as correct. Score had a possible rage of from 0 to 10 correct, excluding the three practice items. The task administration stopped if the child responded to three consecutive items incorrectly.

Syllable Blending: The syllable blending task was designed to assess the child’s ability to blend individually presented syllables to form a word. The examiner told the child, “I will say the parts of a word. You guess what the word is. What word is this?

Ta – ble.” The examiner paused for one second between syllables. If the child responded with table as a whole word without pausing between syllables, the child’s response was 29 scored as correct. The examiner indicated whether each response was correct or incorrect.

If the child repeated the word in parts, the examiner told the child, “Say it faster, like this, table.” Three practice items, including ta – ble, mo – ther, and he – llo were given to the examinees before the administration of actual ten task items. However, the task administration took too long for this age population. The task was dropped from the battery after it was administered to 50 pre-kindergarten children.

Phoneme Blending: The purpose of the phoneme blending task was to measure the child’s ability to blend phonemes together to form a word when phonemes were presented individually. The examiner told the child, “I will say the sounds of a word.

You guess what the word is. What word is this? /P – o – p/.” The examiner paused for one second between sounds. The child should respond with the word pop without pausing or distorting any sounds. If the child repeated the sounds as given by the examiner, she/he was told, “Say it faster, like this pop.” Each child was given three practice items, including /p – o – p/, /d – o – g/, and /c – a – t/, prior to administration of the test items. The examiner acknowledged a correct response. If the child responded incorrectly, the examiner said, “/p – o – p/ is pop.” The stimulus phrase, “What word is this?” was given to the child without any other prompts. The examiner administered the actual task items to the child who responded correctly to at least one of the three practice items. The child’ score was based on the total number of correct responses, with a possible range of 0 to 10 correct, excluding the three practice items. When there were three consecutive wrong items in the child’s responses, the examiner stopped administering the task. 30

Consonants Graphemes: The consonants graphemes task was to assess the child’s knowledge of sound and symbol correspondence when the letters were individually presented. The task was not given to the children who did not know the letters in his or her name. The examiner told the child, “I am going to show you some letters. I want you to tell me what sound each letter makes.” Some of the letters had two acceptable sounds. For instance, if the child responded with /k/ or /s/ for the letter c, the examiner scored the item as correct. But, the consonants graphemes task took too long to administrate, and the initial investigators decided to drop the consonant graphemes task from the battery after administered to 50 children.

Long and Short Vowels Graphemes: The purpose of this task was to measure the child’s knowledge of sound and symbol correspondence of vowels. The examiner showed the vowels cards to the child and said to the child, “I am going to show you some letters. I want you to tell me what sound each letter makes.” The task was given to the children who knew the letters in his or her name. If the child responded with one vowel sound, the examiner said to the child, “Tell me the other sound this letter makes.” There were no practice items for this task. However, the administration for this task was too long for this age population. The task was dropped from the battery after the task was administered to 50 children.

Criterion Measure

There are many different measures that can be employed to appraise the criterion- related validity. The initial investigators developed an alphabet knowledge test to measure the child’s ability to identify the letter names and sounds of the alphabet. An 31 alphabet test was included to determine the predictive validity of each of phonological awareness tasks.

Alphabet Knowledge Test: An alphabet test was to assess the child’s knowledge of alphabet letter names and sound correspondence. The examiner showed the child a list of upper and lower case of letters presented in a random order. The examiner pointed to each letter sequentially and asked the child, “Do you know what this is?” If the child responded with a correct letter name, he or she was asked, “What sound does it make?”

If the child responded with a correct letter name, the child was asked what the letter’s sound was. The examiner recorded child’s responses either correct or incorrect on the paper. Any correct pronunciation of the given letter was deemed letter sound knowledge.

For example, if the alphabet letter ‘C’ or ‘c’ was pronounced /k/, /s/, or /ch/, the child’s response was scored as correct for the letter sound knowledge. The alphabet test included four subtests, including letter name knowledge upper and lower case and letter sound knowledge upper and lower case. Each of the alphabet tests consisted of 16 upper and 16 lower case in a random order. Only the alphabet letters that the child responded correctly on their own were scored as correct. Scores on the alphabet tests ranged from 0 to 16.

Procedure

Assessment Procedure

Assessment of phonological awareness tasks took place over a three-month period during the months of August and October of the pre-kindergarten year of 2002. Fifteen examiners were trained by the initial investigators for two days prior to the assessment 32 session. The initial investigators observed the assessment process for a week in order to insure whether the examiners were fully informed with the administration and scoring procedures.

The number of sessions taken to complete the assessment relied on the levels of examinees’ concentration and frustration. Each of the phonological awareness tasks was administered individually in a quiet room. Items in each task were directly drawn from

The Phonological Awareness Test (Robertson & Salter, 1997), and were given to the examinees in sequential order. Each task of phonological awareness was administered in random order in order to avoid the occurrence of an order effect. All examinees were given three practice items prior to the actual task items. Ten actual task items were administered to only examinees who responded to at least one of the three practice items correctly. With respect to the examinees’ age, frustration, and concentration level, the task administration was stopped if the examinees responded to three consecutive items incorrectly. If the child was losing track of the task, the examiner went back to the practice items to remind the child of the task.

The criterion measure, the alphabet knowledge test was given to the examinees during the months of January and February in the year of 2003. The alphabet test was administered by a new set of assessors who were similarly trained.

Validation Procedure

The validation study for the phonological awareness tasks in this study focused on the content, substantive, structural, and external aspects of construct validity proposed by

Messick (1989). Each aspect of validation procedures is briefly reviewed as follows. 33

The content aspect of validity began with literature review about the relationship between phonological awareness and reading skills of three to seven-year-old children.

The initial investigators selected phonological awareness tasks that were considered to be related to reading skills later on. The content aspect of construct validity was enhanced by a pilot study with 19 pre-kindergarten children and 11 kindergarten children.

The substantive aspect of construct validity focused on the age-appropriateness of task administration. The initial investigators reconstructed guidelines for task administration and scoring procedures. Because the age population in this study was younger than the intended population of The Phonological Awareness Test, the investigators set the ceiling for all subtasks. The actual task items were administered to only examinees who responded to at least one of three practice items correctly.

Moreover, if the examinees responded to three consecutive items incorrectly, or if the examinees showed the symptoms of frustration, the examiners stopped the task administrations. The examiners were trained on the phonological awareness task administration and scoring procedures by the initial investigators for two days. The assessment process was observed by the investigators to ensure whether the examiners were fully informed with phonological awareness task administration and scoring procedures. In addition, the mean performances and the standard deviations were calculated as well as internal consistency, using alpha coefficient.

The structural aspect of construct validity was established by the empirical analyses of items difficulty, item discrimination, and intercorrelations among the tasks.

Factor analysis was also conducted to evaluate the internal structure of the assessment. 34

Finally, as a part of the external aspect of validity, criterion-relatedness was evaluated by multiple regression analysis with total score on the alphabet upper and lower sound knowledge test as the dependent variable and the scores on phonological awareness tasks as the independent variables. In addition to the multiple regression analysis, the correlation coefficients between alphabet name and knowledge tests and phonological awareness tasks were calculated. The external aspect of construct validity also included group differentiation in phonological awareness performances among gender, ethnicity, and socioeconomic status.

35

V. RESULTS

The Content Aspect of Construct Validity

The description of phonological awareness tasks are presented in the Appendix.

In addition to the task descriptions, the Appendix displays the items and correct responses, including the three practice items and the ten actual items.

The Substantive Aspect of Construct Validity

Descriptive

Table 1 and Table 2 summarize subjects’ performances on the tasks of phonological awareness. The possible maximum scores, the mean scores, and the standard deviations are presented, as well as the internal consistency of each task for this sample.

Table 1 is based on the scores that took into consideration practice items. Recall that the actual task items were administered to subjects who responded to at least one of the three practice items correctly. If the subject was given the actual task items, the first item in the current context labeled as ‘preliminary item’ and was scored as correct.

Likewise, if the computations of the means, the standard deviations, and reliabilities were included the preliminary item, it was called ‘preliminary item condition’. Hence, Table 1 had a possible score range of 0 to 11, and a score of 0 indicated that the child responded to all three practice items incorrectly. Table 2 summarized subjects’ performance based on the actual task items. If the computations of the means, standard deviations, and 36 reliabilities were based on only actual ten task items, it was called ‘actual item condition’.

A possible score range in the actual item condition was 0 to 10.

In both of the preliminary item condition and the actual item condition, rhyming discrimination task had the highest mean scores (M = 3.64, SD = 3.88 and M = 3.10, SD =

3.46, respectively). On the other hand, initial isolation task had the lowest mean performance among the tasks in both of the preliminary item condition and the actual item condition (M = 0.87, SD = 2.54 and M = 0.68, SD = 2.29, respectively). In the actual item condition, phonemes blending task also had a low mean score of 0.74, with a standard deviation of 1.86.

Task Reliability

The reliability of each task of phonological awareness was determined by coefficient alpha. Table 3 displays the coefficient alpha of each task of phonological awareness, as well as standard error of measurement in the preliminary item condition, which took into consideration three practice items. Table 4 presents the coefficient alpha and standard error of measurement of each tasks of phonological awareness based on the actual item condition. According to Hills (1981), reliability coefficient should be at least

.85 if the interest of test use is to make decisions about individuals. Therefore, reliability coefficients indicated that all of four phonological awareness tasks had high internal consistencies, with á > .85. In both preliminary item condition and actual item condition, initial isolation task had the highest internal consistency, with a coefficient alpha of .97 and .98, respectively. In contrast, syllables segmentation task had the lowest internal consistency, with coefficient alpha of .89 and .88, respectively.

37

The Structural Aspect of Construct Validity

Item Analyses

All of the items on the phonological awareness tasks were dichotomously scored.

The difficulty level of each task was obtained by averaging the total score mean by the number of items on the task. Table 5 displays the mean difficulty levels. Examinees experienced the greatest difficulty with initial isolation task which was to identify the beginning phonemes in the words (P = .079 in the preliminary item condition, and P =

.067 in the actual item condition). Rhyming discrimination task proved to be the easiest among the tasks (P = .330 in the preliminary item condition and P = .310 in the actual item condition).

Because examinees experienced great difficulty with some of the tasks, item analyses were conducted based on the number of examinees who actually responded to the item in addition to the total number of examinees. The item difficulty corresponded to the proportion of examinees who responded to the item correctly. The value of point biserial correlation between an item score and total score was used for item discrimination. The point biserial correlation coefficient of .350 or greater is considered to differentiate relatively high ability examinees from relatively low ability examinees.

None of the items across the phonological awareness tasks had item discrimination that was less than .350. The results are presented for respective tasks below.

Rhyming Discrimination: Table 6 and 7 display the results of item analyses on the rhyming discrimination task. The item discrimination ranged from .484 to .823 in the preliminary item condition, and ranged from .471 to .817 in the actual item condition.

Approximately 46% of the examinees responded to all of the three practice items 38 incorrectly and were not qualified for taking the task. Examinees were more likely to have difficulty in detecting the non-rhyme words than detecting the rhyme words. All of the non-rhyme words had the item difficulty level of .169 to .222 based on the total number of examinees, and ranged .393 to .484 based on the number of examinees who actually responded to the items. Although the levels of item difficulty were assumed to systematically decrease as the task administration processed, the item difficulty seemed to be unsystematically distributed.

Syllable Segmentation: About 54 % of examinees responded to at least one of the three practice items correctly. Table 8 and 9 show the item difficulty and item discrimination of the syllable segmentation task. In the preliminary item condition, the item discrimination ranged from .466 to .727. In the actual item condition, item discrimination ranged from .475 to .742. Examinees had greater difficulties with more segmented words (e.g. watermelon or kindergarten) than less segmented words (e.g. pizza or candy). All of the four-segmentation words had the item difficulty of less than

.100 when the item analyses were based on the total number of examinees. On the other hand, those items had slightly higher levels of item difficulty of .162 to .204 when the analyses were based on the number of examinees who actually responded to the items.

The levels of item difficulty seemed to be unsystematically distributed on the syllable segmentation task.

Initial Isolation: Examinees had the greatest difficult with initial isolation task.

Only 19 % of the examinees responded to at least one of the three practice items correctly. Table 10 and 11 summarize the item difficulty and item discrimination on the initial isolation task. Item discrimination ranged from .609 to .953 in the preliminary 39 item condition, and ranged from .839 to .955 in the actual item condition. When the item analyses were conducted based on the total number of examinees, the levels of item difficulty seemed to systematically decrease. Moreover, none of the actual task items had the difficulty level of greater than .083. In contrast, the levels of item difficulty seemed to be unsystematically distributed when the item analyses were based on the number of examinees who actually responded to the items. The item difficulty levels increased dramatically when the item analyses were based on the actual number of responded examinees. Initial isolation task seemed to be too difficult for this age population.

Phoneme Blending: Table 12 and 13 display the item analyses of the phoneme blending task. The actual items of phoneme blending task were administered to about 31

% of the total examinees, indicating that about 69% of the examinees responded to all three practice items incorrectly. Item discrimination ranged from .566 to .772 in the preliminary item condition, and ranged from .577 to .755 in the actual item condition.

Although there were more examinees who responded to at least one of the three practice items correctly on the phonemes blending task than on the initial isolation task, examinees seemed to have more difficulty with the actual task items on the phonemes blending task. When the analyses were based on the number of examinees who actually responded to the items, none of the items had the item difficulty level greater than .50 except the first item (P = .598). Furthermore, the levels of item difficulty seemed to systematically decrease when the analyses were based on the actual number of responded examinees, as well as the total number of examinees. Phonemes blending task also seemed to be too difficult for this age population.

40

Task Intercorrelations

The interrelationships between the phonological awareness tasks are demonstrated in the correlation matrix as shown in Table 14. The correlation coefficients were computed based on the actual item condition. Using the Bonferroni approach to control for Type I error across the six correlations (.05/6 = .0083), all of the tasks were significantly correlated one another. The tasks that correlated the highest were initial isolation task and phonemes blending task (r = .51, p < .001). Syllables segmentation task and phonemes blending task had the lowest correlation coefficient (r = .32, p <

.001). The percentage of variance accounted for by the significant correlations ranged from 10.2 % to 26 %, indicating the medium to large strength of the relationships (J.

Cohen & P. Cohen, 1983).

Factor Analysis

A principal component factor analysis was carried out on the correlation matrix of phonological awareness tasks (see Table 14 for correlations). The KMO (Kaiser – Meyer

– Olkin Measure of Sampling Adequacy) of .722 indicated that the correlation matrix of phonological awareness tasks was middling agreeable to factoring. Two criteria were used to determine the number of factors to rotate: eigenvalues-greater-than-one criterion and the scree test. Table 15 displays the eigenvalues and the percentage of variance accounted for. The eigenvalues indicate the variance accounted for by each factor, and

SPSS extracts the number of factors that have eigenvalues greater than one (Green,

Salkind, & Akey, 1997). Only the first factor exceeded the eigenvalues-greater-than-one criterion for number of factors, and it accounted for 54.8 % of the total variance. The 41 factor loadings are presented in Table 16 when the eigenvalues-greater-than-one criterion was considered.

The plot of eigenvalues indicated that a two-factor solution might also be appropriate, especially given that an additional 18.3 % of variance is accounted for (See.

Figure 3). Two factors were extracted by specifying the number of factors in the analysis, and were rotated using a varimax procedure. Table 17 presents the loadings of the phonological awareness tasks on the factor after a varimax rotation, revealing that the smaller unit, phoneme blending and initial isolation tasks loaded highly on Factor 1, whereas the larger unit, rhyming discrimination and syllable segmentation tasks loaded highly on Factor 2. This implies that the four tasks of phonological awareness might have two factors that underlie the measurement.

The External Aspect of Construct Validity

Relationships to Alphabet Knowledge Test

The mean performances and the standard deviations on four tests of alphabet name and sound knowledge are displayed in Table 18, including the possible maximum scores. The letter name knowledge-upper case test had the highest mean score (M =

12.06, SD = 9.92), and the letter sound knowledge-lower case had the lowest mean score

(M = 4.03, SD = 6.67). The predictive correlations between four tasks of phonological awareness and four tests of alphabet knowledge are presented in Table 19. The correlation coefficients were computed based on the number of examinees who actually responded to the phonological awareness task items. Using the Bonferroni method to control for Type I error across the 16 correlations, a p-value of less than .0031 (.05 / 16 = 42

.0031) was required for significance. None of the predictive correlations between the phonological awareness tasks and the alphabet knowledge tests were statistically significantly correlated with one another. The initial isolation task had the highest correlation with the letter sound knowledge -upper case test (r = .25, n = 36, p = .139).

The phoneme blending task had the lowest correlation with the letter name knowledge- lower case test (r = -.03, n = 78, p = .767).

Regression Analysis

A forward regression analysis was conducted with a total score on the alphabet sound-upper and lower case tests as the dependent variable and the four tasks of phonological awareness as the independent variables. The regression analysis was conducted based on the total number of examinees. The mean performance on the alphabet sound knowledge test was 9.06, with a standard deviation of 13.61.

A linear combination of two tasks, initial isolation and phoneme blending made a significant contribution to explaining the variation in the alphabet sound knowledge test,

F (2, 398) = 5.45, p = .005. The sample multiple correlation was .163, indicating that approximately 2 % of the variance of the alphabet sound knowledge test in the sample can be accounted for by the linear combination of initial isolation task and phonemes blending task. The regression equation is shown below.

YPredicted Alphabet Sound = 1.12 Initial Isolation – 0.90 Phonemes Blending + 9.08

The squared cross-validated correlation coefficient was calculated to evaluate how useful the sample regression equation would be useful when it is applied to other examinees in the population (Browne, 1975). The squared cross-validated correlation coefficient was 43

2 fairly small (Rcv = .019) and was similar in value to the squared sample multiple correlation coefficients (R 2 = .163).

Group Differentiation

Gender Differences: A series of independent samples t-test was conducted to evaluate the relationship between gender and the performance on each of the phonological awareness tasks. The Bonferoni procedure was used to control for Type I error across the tests, with a p-value of less than .0125 (.05/4) for the significance. The mean performances and the standard deviations on the each phonological awareness tasks are shown in Table 20. The practical importance, effect size was calculated by the standardized mean differences. The independent sample t-tests indicated that the groups did not significantly differ on the following tasks: rhyming discrimination (t (394) =

0.136, p = .892, d = .014); syllable segmentation (t (392) = 0.627, p = .531, d = .063); initial isolation (t (391) = -0.045, p = .964, d = -.004); and phoneme blending (t (391) = -

0.658, p = .511, d = -.070).

Ethnicity Differences: Table 21 displays the means and the standard deviations on the each task of phonological awareness by ethnic groups. A series of one-way analysis of variance was conducted to determine whether there were differences between ethnic groups. The Bonferroni method was used to control for the Type I error rate across the tests (.05/4 = .0125). The ANOVA results revealed that there were statistically not significant differences among the ethnic group performances on the phonological awareness tasks: rhyming discrimination (F = (4, 312) = 2.42, p = .049, partial ç2 = .030); syllable segmentation (F (4, 310) = 0.38, p = .826, partial ç2 = .005); initial isolation (F 44

(4, 309) = 1.16, p = .328, partial ç2 = .015); and phonemes blending (F (4, 309) = 0.57, p

= .687, partial ç2 = .007).

Socioeconomic Differences: The two socioeconomic groups were identified based on the whether the child receive free or reduced lunch or not. Approximately 30 % of the participants received free or reduced lunch and were identified as lower socioeconomic group. The mean performances and the standard deviations on each of the phonological awareness tasks are shown in Table 22. A series of independent samples t-test was conducted to examine the relationship between the socioeconomic status and the performance on the phonological awareness tasks, using Bonferroni method to control for Type I error across the tests (.05/4 = .0125). The independent samples t-tests indicated nonsignificant relationship between socioeconomic status and the performances on the phonological awareness tasks: rhyming discrimination (t (381) =

0.491, p = .624, d = .061); syllable segmentation (t (379) = 0.236, p = .814, d = .027); initial isolation (t (378) = -0.676, p = .500, d = -.077); and phoneme blending (t (378) =

1.137, p = .256, d = .130).

45

VI. DISCUSSION

The current study was to examine the psychometric characteristics of phonological awareness assessment in pre-kindergarten children. The peculiarity of the validation study is pursuing six distinguishable and interdependent aspects of unitary construct validity suggested by Messick (1989). Based upon the theoretical framework, the study aimed to empirically integrate various components of evidence to form an overall validity judgment to support the intended score interpretation and the implication of score meaning. The aspects of construct validity the study focused on and the limitations of the study are discussed in the following section, as well as the restatement of the six aspects of construct validity.

The Content Aspect of Construct Validity

The validity evidence about the content is to set up the theoretical and empirical basis for specifying the boundaries and the structure of the construct domain to be assessed. The theoretical domain entails the scientific theory about the construct, previous research, and one’s own observations. The empirical domain involves the specific set of observed variables that measure the construct (Benson, 1998). Hence, a matter for discussion about the content-related evidence is to address the professional judgment and documentation to ensure all important parts of the construct domain are covered (Messick, 1995). 46

The content-related validation study was primarily reached by examining previous research about phonological awareness development in young children. The initial investigators, Schwanenflugel and Blake, reviewed approximately 64 studies using a wide variety of phonological awareness tasks to measure three to seven-year-old children’s knowledge of the sound segments with intent to design phonological awareness intervention for ongoing research, “PAVEd for Success” (Hamilton,

Schwanenflugel, Neuharth-Pritchett, & Restrepo, 2002). They summarized the studies by the population age, the types of tasks used, the types of study design, and the findings of the study. The initial investigators selected a subset of eight tasks of phonological awareness which were considered to be significantly related to reading and decoding skills later on. Also, the initial investigators took into consideration the mixture of developmental path of the phonological awareness. They included the tasks that were considered to be the beginning of the developmental continuum, such as rhyme and syllable tasks in order to measure the beginning levels of phonological awareness. The phoneme and grapheme tasks were included to assess the later development of the phonological awareness. The tasks and the items were directly drawn from The

Phonological Awareness Test (Robertson & Salter, 1997). The initial investigators conducted a pilot study with 19 pre-kindergarten and 11 kindergarten children during the months of December and January of the year of 2002 from a local elementary school with parental consent.

The initial investigators systematically investigated and brought the boundaries of theoretical domain into focus based on the series of previous studies concerning the construct. Furthermore, the tasks and the items used in the study drew from the 47 instrument that had established the norms. Nonetheless, some of the tasks of phonological awareness in the study proved to be potentially incompatible to this age population during the tasks administration (cf. the item analyses results). For instance, some of the tasks dropped from the battery because the administration took too long for the age level. For another example, the initial isolation and phoneme blending tasks seemed to be too difficult for this age population. This might be due to the fact that the intended age population of The Phonological Awareness Test was discordant with the age population of the study. The test manual indicates that administering The Phonological

Awareness Test to children younger than 5 years may not be appropriate since they are normally not developmentally ready to perform all of the assessment tasks in The

Phonological Awareness Test. Yet, the test manual points out that it is left to the researcher’s discretion if the administration of particular tasks would be beneficial to obtain the useful information (Robertson & Salter, 1997).

As discussed earlier, understanding the developmental sequence of phonological awareness is important because the different developmental levels of task difficulty are directly related to the issues of assessment validity. The child’s assessed level of phonological awareness can be dramatically affected by the difficulty or complexity of the tasks; that is, the different types of tasks depend on the different levels of cognitive and linguistic abilities of the child.

The task items exceeded the subjects’ levels of attention span or the levels of developmental task difficulty, may lead to the construct invalidity because the tasks are irrelevantly too difficult for the age population. Therefore, the tasks should be revised for 48 this age level. For example, the initial investigators might need to reconstruct items by using more familiar words to this age population.

One way of reconstructing the tasks or items is to comprise a panel of experts to evaluate the content and format relevance. The content experts’ judgment about the degree to which the item reflects the content defined by the facet of the domain specification can provide ongoing professional test-development and systematic documentation of the consensus of multiple judges (Messick, 1989). The judgment of experts on the content relevance can be numerically summarized in statistical techniques

[e.g., index of item congruence (Hanbleton, 1980)]. The index of item congruence ranges from -1 to +1, with the highest value of +1 indicating that all content experts agree that the item is congruent with the domain specification. In addition, factor analysis or multi- dimensional scaling of relevance rating by multiple experts can be useful tools for the purpose of content validity that examine the theoretical boundaries of the construct

(Beson, 1998; Messick, 1989).

The Substantive Aspect of Construct Validity

The substantive component of construct validity incorporates the content properties and the response consistencies. Indeed, the substantive aspect is to provide theoretical rationales and empirical evidence of response consistencies or performance regularities that manifest the domain specifications (Loevinger, 1957; Messick, 1989,

1995).

The substantive aspect of validity on the present study focused on the structure of task administration and scoring procedures in order to make subjective judgments and to 49 show that the scores were based on the completion of a process. Regarding the concentration levels and the cognitive abilities of the age population, the initial investigators set up two types of ceiling for all the subtasks. The ceiling for administration starting rule was that the actual task items were administered to subjects who responded to at least one of the three practice items correctly. Then, the ceiling for administration termination rule was that the task administration was stopped if there were three consecutive incorrect items in the responses. Setting the ceiling for the tasks might be one of the reasons that some of the tasks turned out to be too difficult. For instance, the majority of the subjects were not qualified for taking the actual task items on the initial isolation and phonemes blending tasks. Termination of the task administration after the three consecutive incorrect responses also reduced the number of respondents as the administration processes. The reduction of the number of respondents might influence the levels of item difficulty. Use of the ceiling for both administration rule and termination rule may acquire careful inspection because the observed set of responses used to estimate the subjects’ abilities to successfully perform the task would be restricted by the application of the ceiling.

The empirical evidence of response consistency in the study was derived from the correlation patterns among the items on each task. The internal consistency was measured by coefficient alpha, revealing that all of four tasks had high internal consistencies, with á > .85. The high internal consistencies of the phonological awareness tasks in the present study are likely to be as a consequence of the task difficulty or setting the ceiling for the tasks. For example, if the particular task was difficult for most of the subjects, the variance would be small, and the task reliability 50 would increase. Accordingly, one should be cautious to interpret the coefficient alpha since the task reliability is an important consideration in task selection, and the task reliability can be affected by multiple factors, such as the variance, the length of the task, or the quality of instrument itself.

In addition, Messick (1989) suggests a combined convergent-discriminant strategy for test construction as an elaboration of substantive approach. The convergent- discriminant strategy is to develop measures of two or more distinct construct at the same time. If the combined pool of items correlate more highly with their own purported construct score than with score for other constructs, the items are kept on a given construct scale. Hence, item selection could be systematically based upon convergent and discriminant evidence, while method contaminants could be suppressed at the same time. The present study was not able to achieve such strategy since the items were drawn from the commercial assessment instrument. If a whole task construction process was employed, it would be feasible to conduct such an elaboration of substantive approach to investigate the convergent and discriminant evidence for item selection to provide explicit reference to task cover and to rationally attune to the nature of the construct in sound.

The Structural Aspect of Construct Validity

The structural aspect of construct validity entails the analyses of internal structure of the task that appraise the relationships among the task items and the theory of the construct domain. Messick (1995) notes that the structural aspect of validity should evaluate not only the selection or construction of assessment tasks related to the domain 51 construct but also the rational development of construct-based scoring criteria, rubrics, and guidelines. The structural component of validation in the study subsumes the empirical analyses of item difficulty, item discrimination, and factor analysis in addition to the task intercorrelations.

Item Analyses

The results of item analyses obtained in the current study agree with previous studies regarding the levels of task difficulty. Generally, rhyme task is thought to be the easiest, and phonemes deletion or phonemes segmentation is considered to be the most difficult among the phonological awareness tasks (Hoien et al., 1994; Stanovich et al.,

1984; Yopp, 1988). Likewise, the present study found that rhyming discrimination was the easiest, while initial isolation was the most difficult among the phonological awareness tasks used in the study.

As noted earlier, the item analyses were conducted under two sets of conditions.

First, there was the preliminary item condition that took into consideration three practice items. In this case, the preliminary item was scored as 1 if the subject responded to at least one of the three practice items correctly; otherwise, it was scored as 0 on the preliminary item. Then, there was the actual item condition which considered only the actual task items. Therefore, item difficulties of preliminary items in Table 6, 8, 10, and

12 imply the proportion of subjects who were qualified for taking the actual task items.

This was based on the assumption that the score of 0 on the preliminary item condition differed from the score of 0 on the actual item condition.

The second set of item analyses conditions relied on the number of subjects considered in the data analyses. The item analyses were conducted based on the total 52 number of subjects (N = 415), and based on the number of subjects who actually responded to the items. Indeed, the subjects who responded to any one of the three consecutive items incorrectly were excluded from the latter case of the item analyses.

This was because the examinees experienced great difficulty with some of the tasks, and this strategy was to ensure more suitable item analyses.

It was assumed that the item difficulties on each of the tasks would systematically decrease as the task administrations progressed because of the ceiling for termination rule. The results of item analyses indicated that the levels of item difficulty were unsystematically distributed on rhyming discrimination and syllable segmentation tasks when both the total number of subjects and the actual number of respondents on the items were considered for the data analyses. The item difficulties seemed to systematically decrease as the administration processed on the initial isolation task when the item analyses were based on the total number of subjects. However, the levels of item difficulty were unsystematically distributed when the actual number of respondents on the items was applied to the data analyses. In contrast, the levels of item difficulty seemed to systematically decrease on the phonemes blending task based on both the total number of subjects and the actual number of respondents on the items.

Interestingly, there were great discrepancies in the levels of item difficulty on the initial isolation task when the item analyses were based on the total number of subjects and based on the number of subjects who actually responded to the items. When the analyses considered the number of subjects who actually responded to the items, the levels of item difficulty increased greatly. The item difficulties ranged from .053 (the last actual item, laugh) to .190 (preliminary item) when the data analyses were based on the 53 total number of subjects. On the other hand, the item difficulties ranged from .190

(preliminary item) to .857 (the fourth actual item, fudge) when the item analyses were based on the number of subjects who actually responded to the items. Similarly, the levels of item difficulty on phonemes blending task ranged from .031 (the last actual item, /s – l – i – p – çr/) to .308 (preliminary item) when the total number of subjects was applied to the item analyses. When the item analyses were based on the number of subjects who actually responded to the items, the levels of item difficult ranged from .216

(the second actual item, /n – ç/) to .598 (the first actual item, /b – oi/) (see. Table 10 and

12).

Item discrimination for each item was estimated by the point biserial correlation coefficient. None of the items on each of the tasks had the item discrimination of less than .35, revealing that the items discriminated well between the subjects with relatively high abilities and relatively low abilities.

The separate item analyses on the all three practice items instead of the combined set of the three preliminary items would provide valuable information to evaluate the appropriateness of ceiling and to estimate subjects’ abilities to successfully perform the tasks. That is, the ability of subjects who responded to all three practice items incorrectly is more likely to differ from the ability of subject who responded to only one practice item incorrectly. In that sense, it would be desirable to record all the information about the subjects’ responses on the practice items in addition to the actual task items for more detailed empirical analyses for those items to estimate the subjects’ potential abilities to successfully perform the tasks.

54

Task Intercorrelations

Findings of previous studies indicated that various tasks to measure the knowledge of sound segments were correlated with one another (e.g., Hoien et al., 1995;

Stanovich et al., 1984; Yopp, 1988). Likewise, the four tasks of phonological awareness used in the study were significantly intercorrelated, suggesting that they tap much of the same construct that underlies the measurements.

Factor Analysis

Since statistically significant interrelationships were obtained in the correlation matrix, a principal component factor analysis was conducted in order to examine the underlying structure. Using the eigenvalues-greater-than-one criterion, the factor analysis extracted one factor, which accounted for 54.8 % of the total variance. Each of the tasks strongly loaded on the factor, revealing that the construct may explain each task well. Yopp (1988) conducted factor analysis based on the ten tasks of phonological awareness and yielded a two-factor solution. She labeled the first factor as Simple

Phonemic Awareness and the second factor as Compounded Phonemic Awareness.

Hoien and his associates (1995) conducted a factor analysis with six tasks of phonological awareness and found a three-factor solution, phoneme factor, syllable factor, and rhyme factor. The present study does not agree with these two studies regarding the dimensionality. This might be due to the fact that the current study conducted factor analysis based on a limited number of tasks.

The plot of eigenvalues was also used to determine the number of factors to rotate and showed that a two-factor solution might also be appropriate. Since an additional 18.3

% of total variance was accounted for by the second factor, factor analysis extracted two 55 factors by specifying the number of factors in the analysis. Phoneme blending and initial isolation tasks loaded highly on the first factor, with loadings of .88 and .79, respectively; while rhyming discrimination and syllable blending tasks loaded highly on the second factor, with loadings of .81 and .81, respectively. This finding is somewhat consistent with the findings of Hoien and his colleagues (1995). Their findings indicated that the ability to analyze the smaller units, phonemes is separable from the ability to analyze the larger unit, rhymes or syllables. Since the scree test yields more accurate analysis than the eigenvalues-greater-than-one criterion (Green, Salkin, & Akey, 1997) and the second factor accounted for large amount of total variance, it is concluded that two factors underlie the construct of phonological awareness. However, it would be more advisable to conduct confirmatory factor analysis to verify the underlying structure of phonological awareness found in the present study.

The Generalizability Aspect of Construct Validity

The generalizability component is to examine the replicability or consistency of assessment results across population groups, situations, time periods, and task domains, in order to set boundaries of score meaning (Messick, 1995). According to Messick

(1989) the generality of construct meaning can be evaluated by any or all of the techniques of construct validation. Assessing the comparable correlation patterns with other measures, examining the test score across random samples of different groups (e.g., ethnic, cultural, or SES groups), and combing indicators of test-retest reliability and construct meaning are examples of techniques to appraise the generalizability of score meaning. Therefore, the present study also assesses the generality of construct meaning 56 since the purpose of the study was to empirically follow the construct validation process advocated by Messick (1989) although it provides the limited evidence about the consistency of assessment results across multiple levels of random facets of phonological awareness assessment.

Devising a more direct way to appraise the generalizability aspect of construct validity would be beneficial. For example, Benson (1998) recommended that generalizability theory is a useful method to differentiate types of errors in measurement and to provide evidence for how well the empirical domain represents the theoretical domain. Furthermore, she suggests an informative set of studies that includes confirmatory factor analysis and generalizability theory. Confirmatory factor analysis is designed to determine how well the specific set of observed variables fit the structure of the theoretical domain, and generalizability theory is to evaluate how adequately the items are representative of the empirical domain.

The External Aspect of Construct Validity

The external aspect of construct validity is to evaluate how well the assessed construct empirically correlates in an expected way with different constructs and characteristics of the subjects. The evidence about the external structure becomes especially important if the assessment results are used for selection, placement, licensure, or program evaluation (Messick, 1995). The present study includes the empirical relationships between the tasks of phonological awareness and the tests of alphabet knowledge by correlation coefficients and multiple regression analysis, and group differentiation to establish the external evidence. 57

Relationships to Alphabet Knowledge Tests

None of the phonological awareness tasks were statistically significantly correlated with four tests of alphabet knowledge. This finding is contradictory to the findings of Lonigan and his associates (2000) that there was a predictive relation between phonological awareness and later letter knowledge. This conflict might be due to difference in time interval between the administration of phonological awareness and the alphabet knowledge test. Lonigan and his associates (2000) had about a 12-month time interval between the phonological awareness tasks at time 1 and the letter knowledge test at time 2. On the other hand, the current study administered the alphabet name and sound knowledge tests after about a four month time interval. These non-significant correlations between the phonological awareness tasks and the alphabet knowledge tests might also be the results of unknown characteristics of the subjects which might affect the test scores in the present study. The subjects in the study included a various ethnic and language background. Thus, there might be some outliers or compounding variables that affected the score interpretation due to the limited English proficiency or speech impairment. For example, examining the outliers through the residual would be beneficial, as well as gathering more detailed information about language related impairments. The further investigation with the data collected later than the current study and more explicit reading tests might provide more clear explanation whether the phonological awareness is significant predictors of later reading and decoding skills.

Regression Analysis

A linear combination of initial isolation and phonemes blending tasks made a statistically significant contribution to accounting for the variance in the alphabet sound- 58 upper and lower case test, although only 2% of the variance in the alphabet sound test was accounted for by the linear combination of phonological awareness tasks and the squared cross-validated correlation coefficient was similarly .019. The result of regression analysis in the study is similar to the finding of Hoien and his colleagues

(1995) that phonemic awareness proved to be a more potent predictor of early reading acquisition than syllable or rhyme tasks. Because the linear combination of initial isolation and phoneme blending tasks explained only 2 % of the variance of the alphabet sound knowledge test, one needs to be cautious to interpret the result of the regression analysis. In order to evaluate the relationships between the phonological awareness and reading development more precisely, including the assessment of more explicit measure of reading and decoding skills in children one or more years later might be useful.

Group Differentiation

The current study indicates consistent results with findings of Burt and her associates (1999) that there is no significant gender difference in the performance on the phonological awareness tasks. The current study also reports that there are no statistically significant differences in phonological awareness task performances between the lower socioeconomic (SES) group and the upper SES group; in contrast, Burt and her colleagues (1999) found that the upper socioeconomic group significantly outperformed the lower group. Since the subjects were ethnically diverse, and about 24.2 % of the subjects spoke other than English as a first language, the present study also took into consideration ethnic differences. There were no statistically significant differences among the different ethnic groups. The present study found that the tasks of 59 phonological awareness do not seem to have gender, SES, and ethnicity differences when tested separately.

In addition, study with multitrait-multimethod matrix and structural equation modeling can provide valuable information about external structure of the assessments.

Multitrait-multimethod matrix can provide empirical collection of convergent and discriminant evidence by displaying all of the intercorrelations generated when each of several constructs or traits is measured by each of several methods. Therefore, the multitrait-multimethod matrix allows estimating the relative contributions of trait and method variance related to the particular construct measures (Messick, 1989).

Conducting such a method would be beneficial because multitrait-multimethod matrix entails sound judgment about the constructs to be included in a matrix and offers provisional evidence to support the nomological validity of the construct. Benson (1998) suggests the structural equation modeling (SEM) to examine the external aspect of construct validation. SEM links a specific set of items to the hypothesized structure of the construct, and the structural model links the constructs with the nomological network which is theoretical constructs and hypothesized relationships among the constructs.

The Consequential Aspect of Construct Validity

The consequential component of construct validity is fundamentally concerned with any negative implication on individuals or groups due to the construct underrepresentation or construct-irrelevant variance. Although some of the tasks used in the current study might be too difficult for this age population, the levels of task difficulty did not seemed to affect the scores of certain individuals or groups, such as different 60 ethnic groups and different SES. Additionally, the tasks used to measure the subjects sensitivity to or ability to analyze the spoken language segments that comprise the words agree to the purpose of the instrument which was designed to diagnose deficits in phonological awareness and phoneme-grapheme correspondence.

As noted earlier, validity evidence relevant to all of the six aspects need to be accumulated into an overall validity judgment to support score-based interpretations and action implications. This process includes relevant sampling domain, constructing relevant assessment tasks, appropriate task administration and scoring procedure, and careful attention to the tasks invalidity. Figure 4 displays the assessment construction procedures corresponding to the six aspects of construct validation procedures.

61

VII. CONCLUSION

The present study provides information about the psychometric characteristics of phonological awareness assessment in pre-kindergarten children. Most of all, the study aims to empirically implement the theoretical framework for unitary construct validity that integrates various sources of evidence to support the validity of the score derived from the test.

The current study confirms previous findings regarding the developmental levels of task difficulty. Although some of the tasks seemed to be too difficult for this age level, the study found that two factors underlie the construct of phonological awareness. These two factors accounted 73.12 % of the total variance, supporting the structural concept of phonological awareness. Furthermore, a linear combination of initial isolation and phoneme blending tasks, from the first factor, support the predictive validity for the initial stage of reading acquisition although the practical importance was fairly small. In addition, the initial investigators modified the technical quality of a testing system to establish standard setting for the age level, as well as appropriate task administration and scoring procedures. The levels of task difficulty do not seem to affect certain types of individuals or groups. From the various components of the validity evidence, it is concluded that the tasks of phonological awareness in the study provide valuable information about the knowledge of sound segments in pre-kindergarten children. 62

Indeed, the present study carries out the unitary conception of construct validation that accumulates content, criteria, and consequences together to form a scientific basis for addressing score-based interpretations, utility of score meaning, and value implications as a ground for action. One should note that the validation is a matter of degree rather than the property of all or none. The degree to which the score interpretation and implications of score meaning remain valid across individuals or population, across settings, or across task context is a continual issue because the interpretation of score on the construct changes as the social conditions shift (Benson, 1998; Messick, 1989). This is why the validity is an evolving property, while the validation is a continual process. Therefore, ongoing validation studies are necessary to reestablish the validity in order for a test to remain valid over time.

63

REFERENCES

Adams, M. J. (1990). Beginning to read: Thinking and learning about print.

Cambridge, MA: MIT Press.

American Association, American Psychological Association, &

National Council on Measurement in (1999). Standards for

educational and . Washington DC: Author.

Backman, J. (1983). The role of psycholinguistic skills in reading acquisition: A look at

early readers. Reading Research Quarterly, 18, 466-479.

Benson, J. (1998). Developing a strong program of construct validation: A test anxiety

example. Educational Measurement: Issues and Practice, 17(1), 10-17, 22.

Bishop, D. V. M., & Adams, C. (1990). A prospective study of the relationship between

Specific language impairment, phonological disorders, and reading retardation.

Journal of Child and Psychiatry and Allied Disciplines, 31, 1027-

1050.

Blachman, B. A. (1994). Early literacy acquisition: The role of phonological awareness.

In G. P. Wallach & K. G. Butler (Eds.), Language learning disabilities in school-

age children and adolescents: Some principles and applications (pp. 253-274).

New York, NY: Macmillan.

Bradley, L. L., & Bryant, P. E. (1983). Categorizing sounds and learning to read: A

causal connection. Nature, 301, 419-421.

Browne, M. W. (1975). Predictive validity of a linear regression equation. British 64

Journal of Mathematical and Statistical Psychology, 28, 79-87.

Bryant, P. E., MacLean, M., & Bradley, L. L. (1990). Rhyme, language, and children’s

reading. Applied , 11, 237-252.

Bryant, P. E., MacLean, M., Bradley, L. L., & Crossland, J. (1990). Rhyme and

alliteration, phoneme detection, and learning to read. ,

26, 429-438.

Burt, L., Holm, A., & Dodd, B. (1999). Phonological awareness skills of 4-year-old

British children: An assessment and developmental data. International Journal of

Language & Communication Disorders, 34, 311-335.

Chall, J. S., Jacobs, V., & Baldwin, L. (1990). The reading crisis: Why poor children fall

behind. Cambridge, MA: Harvard University Press.

Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the

behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests.

Psychological Bulletin, 52, 281-302.

Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation

to reading experience and ability 10 years later. Developmental Psychology, 33,

934-945.

Goswami, U. (1986). Children’s use of analogy in learning to read: A developmental

study. Journal of Experimental Child Psychology, 42, 73-83.

Goswami, U. (1988). Children’s use of analogy in learning to spell. British Journal of

Developmental Psychology, 6, 21-33. 65

Goswami, U., & Bryant, P. (1990). Phonological skills and learning to read. Hillsdale,

NJ: Lawrence Erlbaum.

Green, S. B., Salkind, N. J., & Akey, T. M. (1997). Using SPSS for windows: Analyzing

and understanding data (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Hambleton, R. K. (1980). Test score validity and standard-setting methods. In R. A.

(Ed.). Criterion-referenced measurement: The state of the art (pp. 80-123).

Baltimore, MD: Johns Hopkins University Press.

Hamilton, C. E., Schwanenflugel, P., Neuharth-Pritchett, S., & Restrepo, M. A. (2002).

Data from on-going research PAVEd for Success, Unpublished.

Hill, S. (1999). Phonics. York, ME: Stenhouse Publishers.

Hills, J. R. (1981). Measurement and evaluation in the classroom (2nd ed.). Columbus,

OH: Charles E. Merrill.

Hoien, T., Lundberg, I., Stanovich, K. S., & Bjaalid, I. (1995). Component of

phonological awareness. Reading and Writing: An Interdisciplinary Journal, 7,

171-188.

Johnston, P., & Allington, R. (1991). Remediation. In R. Barr, M. Kamil, P. Mosenthal,

& P. D. Pearson (Eds.), Handbook of reading research (pp. 984-1012). New

York: Longman.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).

Perception of the speech code. Psychological Review, 74, 431-461.

Liberman, I. Y. (1978). Segmentation of the spoken word and reading acquisition.

Bulletin of the Orton Society, 23, 65-77. 66

Liberman, I. Y., Shankweiler, D. P., Fischer, F. W., & Carter, B. (1974). Explicit

syllable and phoneme segmentation in the young child. Journal of Experimental

Child Psychology, 18, 201-212.

Liberman, I. Y., Shankweiler, D. P., & Liberman, A. M. (1989). The alphabetic principle

and learning to read. In D. Shankweiler & I. Y. Liberman (Eds.), Phonology and

reading disability: Solving the reading puzzle (pp. 1-33). Ann Arbor: University

of Michigan Press.

Loevinger, J. (1957). Objective tests as instruments of psychological theory.

Psychological Reports, 3, 635-694.

Lonigan, C. J., Burgess, S. R., Anthony, J. L., & Barker, T. A. (1998). Development of

phonological sensitivity in 2-to-5-year-old children. Journal of Educational

Psychology, 90, 294-311.

Lonigan, C. J., Burgess, S. R., & Anthony, J. L. (2000). Development of emergent

literacy and early reading skills in preschool children: Evidence from a latent-

variable longitudinal study. Developmental Psychology, 36, 596-613.

MacLean, M., Bryant, P. E., & Bradley, L. L. (1987). Rhymes, nursery rhymes, and

reading in early childhood. Merrill-Palmer Quarterly, 33, 11-37.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.,

pp. 13-103). New York: Macmillan.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from

persons’ responses and performances as scientific inquiry into score meaning.

American , 9, 741-749.

Miller, M. D., & Linn, R. L. (2000). Validation of performance based assessments. 67

Applied Psychological Measurement, 24, 367-378.

Robertson, C., & Salter, W. (1997). The phonological awareness test. East Moline, IL:

LinguiSystem.

Share, D. L., & Stanovich, K. E. (1995). Cognitive processes in early reading

development: Accommodating individual differences into a model of acquisition.

Issues in Education: Contributions from , 1, 1-57.

Stahl, S. A., & Murray, B. A. (1994). Defining phonological awareness and its

relationship to early reading. Journal of Educational Psychology, 86, 221-234.

Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual

differences in the acquisition of literacy. Reading Research, Quarterly, 21, 360-

407.

Stanovich, K. E. (1992). Speculations on the cause and consequences of individual

differences in early reading acquisition. In P. B. Gough, L. C. Ehri, & R. Treiman

(Eds.), Reading acquisition (pp. 307-342). Hillsdale, NJ: Lawrence Erlbaum.

Stanovich, K. E., Cunningham, A. E., & Cramer, B. B. (1984). Assessing phonological

awareness in kindergarten children: Issues of task comparability. Journal of

Experimental Child Psychology, 38, 175-190.

Stevenson, H. W., & Newman, R. S. (1986). Long – term prediction of achievement and

attitudes in mathematics and reading. Child Development, 57, 646-659.

Sulzby, E., & Teale, W. (1991). Emergent of literacy. In R. Barr, M. Kamil, P.

Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (pp. 727-758).

New York: Longman.

Torgesen, J. K. (1999). Phonologically based reading disabilities: Toward a coherent 68

theory of one kind of learning disability. In R. J. Sternberg & L. Spear-Swerling

(Eds.), Perspectives on learning disabilities: Biological, cognitive, contextual (pp.

106-135). Boulder, CO: Westview Press.

Torgesen, J. K., & Mathes, P. G. (2000). A basic guide to understanding, assessing, and

teaching phonological awareness. Austin, TX: Pro-ED.

Wagner, R. K., & Torgesen, J. K. (1987). The nature of phonological processing and its

causal role in the acquisition of reading skills. Psychological Bulletin, 101, 192-

212.

Wagner, R. K., Torgesen, J. K., Rashotte, C. A., Hecht, S. A., Barker, T. A., Burgess, S.

R., Donahue, J., & Garon, T. (1997). Changing relations between phonological

processing abilities and word-level reading as children develop from beginning to

skilled readers: A 5-year longitudinal study. Developmental Psychology, 33, 468-

479.

Whitehurst, G. J., & Lonigan, C. J. (1998). Child development and emergent literacy.

Child Development, 69, 848-872.

Yopp, H. L. (1988). The validity and reliability of phonemic awareness tests. Reading

Research Quarterly, 23, 159-177.

Yopp, H. K., & Yopp, R. H. (2000). Supporting phonemic awareness development in the

classroom. The Reading Teacher, 54, 130-143.

69

Table 1

The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Preliminary Item Condition

Task Max. Score M SD N

Rhyming discrimination 11 3.64 3.88 415

Syllable segmentation 11 2.35 2.93 415

Initial isolation 11 0.87 2.54 415

Phonemes blending 11 1.05 2.17 415

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1.

70

Table 2

The Maximum Scores, the Means, and the Standard Deviations for Phonological

Awareness Tasks Based on the Actual Item Condition

Task Max. Score M SD N

Rhyming discrimination 10 3.10 3.46 415

Syllable segmentation 10 1.81 2.58 415

Initial isolation 10 0.68 2.29 415

Phonemes blending 10 0.74 1.86 415

71

Table 3

Coefficients Alpha and the Standard Error of Measurements for Phonological Awareness

Tasks Based on the Preliminary Item Condition

Task á SEM N

Rhyming discrimination .93 1.02 415

Syllable segmentation .88 0.98 415

Initial isolation .97 0.45 415

Phonemes blending .89 0.71 415

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1.

72

Table 4

Coefficients Alpha and the Standard Error of Measurements for Phonological

Awareness Tasks Based on the Actual Item Condition

Task á SEM N

Rhyming discrimination .92 0.98 415

Syllable segmentation .88 0.93 415

Initial isolation .98 0.34 415

Phonemes blending .89 0.60 415

73

Table 5

The Mean Levels of Task Difficulty of Phonological Awareness Tasks

Preliminary item Actual item

Task condition condition N

Rhyming discrimination .330 .310 415

Syllable segmentation .214 .181 415

Initial isolation .079 .067 415

Phonemes blending .095 .074 415

74

Table 6

Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item

Condition

Item Item difficultya Item difficultyb nb Item discriminationa

Preliminary .542 .542 415 .823 book – look .412 .760 225 .759 fun – run .417 .772 224 .782 ring – rat .222 .414 222 .484 box – mess .222 .449 205 .560 fish – dish .371 .762 202 .812 mop – hop .357 .767 193 .813 shoe – fan .219 .484 188 .601 sweater – better .347 .778 185 .802 camper – hamper .361 .829 181 .817 pudding – table .169 .393 178 .565

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1. aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

75

Table 7

Item Analyses for Rhyming Discrimination Task Based on the Actual Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa book – look .412 .760 225 .736 fun – run .417 .772 224 .761 ring – rat .222 .414 222 .471 box – mess .222 .449 205 .558 fish – dish .371 .762 202 .809 mop – hop .357 .767 193 .814 shoe – fan .219 .484 188 .604 sweater – better .347 .778 185 .803 camper – hamper .361 .829 181 .817 pudding – table .169 .393 178 .576

aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

76

Table 8

Item Analyses for Syllable Segmentation Task Based on the Preliminary Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa

Preliminary .540 .540 415 .650 pizza .316 .585 224 .672 watermelon .087 .162 222 .466 fix .337 .639 219 .575 calendar .166 .375 184 .352 television .089 .204 181 .572 moose .275 .659 173 .634 elephant .106 .273 161 .587 pillow .178 .481 154 .695 kindergarten .070 .195 149 .565 candy .190 .552 143 .727

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1. aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

77

Table 9

Item Analyses for Syllable Segmentation Task Based on the Actual Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa pizza .316 .585 224 .633 watermelon .087 .162 222 .475 fix .337 .639 219 .514 calendar .166 .375 184 .664 television .089 .204 181 .597 moose .275 .659 173 .604 elephant .106 .273 161 .608 pillow .178 .481 154 .709 kindergarten .070 .195 149 .596 candy .190 .552 143 .742

aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

78

Table 10

Item Analyses for Initial Isolation Task Based on the Preliminary Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa

Preliminary .190 .190 415 .609 bite .082 .430 79 .953 toy .075 .397 78 .897 dinosaur .065 .355 76 .862 fudge .072 .857 35 .927 nose .072 .833 36 .900 apple .065 .750 36 .881 garage .063 .844 36 .722 happy .063 .743 35 .890 chalk .065 .794 34 .862 laugh .053 .647 34 .827

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1. aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

79

Table 11

Item Analyses for Initial Isolation Task Based on the Actual Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa bite .082 .430 79 .955 toy .075 .397 78 .898 dinosaur .065 .355 76 .867 fudge .072 .857 35 .934 nose .072 .833 36 .904 apple .065 .750 36 .888 garage .063 .844 36 .848 happy .063 .743 35 .901 chalk .065 .794 34 .867 laugh .053 .647 34 .839

aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

80

Table 12

Item Analyses for Phoneme Blending Task Based on the Preliminary Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa

Preliminary .308 .308 415 .600

/b – oi/ .183 .598 127 .772

/n – ç/ .065 .216 125 .579

/p – ö/ .067 .286 98 .647

/s – i – t/ .092 .396 96 .651

/f – l – î/ .087 .456 70 .743

/m – ou – s/ .082 .472 72 .663

/k – î – n – d/ .051 .313 67 .630

/s – n – a – p/ .043 .327 55 .646

/m – i – l – k/ .043 .316 57 .566

/s – l – i – p – çr/ .031 .236 55 .589

Note. Preliminary item condition is score 0 if the examinee responded to all three practice items incorrectly otherwise, it is scored 1. aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

81

Table 13

Item Analyses for Phoneme Blending Task Based on the Actual Item Condition

Item Item difficultya Item difficultyb nb Item discriminationa

/b – oi/ .183 .598 127 .706

/n – ç/ .065 .216 125 .577

/p – ö/ .067 .286 98 .656

/s – i – t/ .092 .396 96 .640

/f – l – î/ .087 .456 70 .755

/m – ou – s/ .082 .472 72 .662

/k – î – n – d/ .051 .313 67 .653

/s – n – a – p/ .043 .327 55 .679

/m – i – l – k/ .043 .316 57 .583

/s – l – i – p – çr/ .031 .236 55 .624

aItem difficulties and item discrimination are based on the total number of examinees (N

= 415) bItem difficulties are based on the number of examinees who actually responded to the items.

82

Table 14

Intercorrelations among the Phonological Awareness Tasks

Task 1 2 3 4

1. Rhyming Discrimination — .40 .36 .36

2. Syllables Segmentation — .43 .32

3. Initial Isolation — .51

4. Phonemes Blending —

Note. Computations are based on the actual item condition

83

Table 15

Factors, Eigenvalues, and Percentage of Variance Accounted for

Factor Eigenvalue Percentage of Variance Total Variance

1 2.19 54.83 54.83

2 .73 18.29 73.12

3 .62 15.44 88.56

4 .46 11.44 100.00

Note. Factor analysis is conducted based on the actual item condition.

84

Table 16

Factor Loadings for One-Factor Solution

Task Factor

Rhyming discrimination .70

Syllables segmentation .72

Initial isolation .79

Phonemes blending .75

Note. Factor analysis is conducted based on the actual item condition.

85

Table 17

Factor Loadings for Two-Factor Solution after Varimax Rotation

Task Factor 1 Factor 2

Rhyming Discrimination .20 .81

Syllables Segmentation .22 .81

Initial Isolation .79 .32

Phonemes Blending .88 .15

Note. Factor analysis is conducted based on the actual item condition.

86

Table 18

The Means and the Standard Deviations of Alphabet Knowledge Tests

Max.

Tests Score M SD N

Letter name knowledge-upper case 16 12.06 9.92 415

Letter sound knowledge-upper case 16 5.03 7.21 415

Letter name knowledge-lower case 16 9.55 8.80 415

Letter sound knowledge-lower case 16 4.03 6.67 415

87

Table 19

Predictive Correlations between Phonological Awareness Tasks and Alphabet

Knowledge Tests

Rhyming Syllables Phonemes

Task discrimination segmentation Initial isolation blending

Letter name .07 .16 .08 -.03 knowledge-upper case p = .298 p = .033 p = .666 p = .767

Letter sound .05 .20 .25 .12 knowledge-upper case p = .434 p = .006 p = .139 p = .300

Letter name .07 .17 .09 -.06 knowledge-lower case p = .258 p = .021 p = .605 p = .629

Letter sound .05 .20 .25 .15 knowledge-lower case p = .479 p = .007 p = .140 p = .179

N 212 184 36 78

Note. Correlation coefficients are computed based on the number of examinees who actually responded to the items on the phonological awareness tasks

88

Table 20

The Means and Standard Deviations of Phonological Awareness Tasks by Gender Group

Male Female

Task M SD N M SD M

Rhyming discrimination 3.18 3.47 200 3.13 3.45 196

Syllables segmentation 1.89 2.70 199 1.73 2.40 195

Initial isolation 0.68 2.31 198 0.69 2.32 195

Phonemes blending 0.68 1.91 198 0.81 1.80 195

Note. The analysis is based on the total number of examinees in the actual item condition.

89

Table 21

The Means and the Standard Deviations of Phonological Awareness Tasks by Ethnic

Group

Task African-

American Asian Bi-Racial Caucasian Hispanic

M SD M SD M SD M SD M SD

Rhyming 2.64 3.23 1.94 2.82 3.75 2.87 3.48 3.58 4.02 3.85 discrimination

Syllables 1.75 2.49 1.56 2.39 2.25 2.63 1.58 2.40 2.03 2.86 segmentation

Initial 0.67 2.35 0.00 0.00 0.00 0.00 0.50 1.89 1.12 2.96 isolation

Phonemes 0.63 1.79 0.19 0.75 0.75 1.50 0.75 1.75 0.88 1.97 blending

N 128 16 4 106 60

Note. The analysis is based on the total number of examinees in the actual item condition.

90

Table 22

The Means and the Standard Deviations of Phonological Awareness Tasks by

Socioeconomic Group

Lower group Upper group

Task M SD M SD

Rhyming discrimination 3.33 3.25 3.14 3.55

Syllables segmentation 1.86 2.63 1.79 2.56

Initial isolation 0.57 2.14 0.75 2.43

Phonemes blending 0.90 2.05 0.66 1.76

N 115 265

Note. The analysis is based on the total number of examinees in the actual item condition.

Socioeconomic status is based on whether the subject receives free or reduced lunch or not.

91

Figure 1

Developmental Sequence of Phonological Awareness

Age Development in phonological awareness tasks

3-year-olds Can recite nursery rhymes.

4-year-olds Can detect if two words rhyme.

Can produce a rhyme for a simple word.

5-year-olds Can understand the components of sounds that make them the

same of different.

Can isolate and pronounce the initial sound in a word.

Can blend and segment words into the syllabic units.

6-year-olds Can isolate and pronounce sounds in up to three-phoneme words.

Can blend the sounds in four-phoneme words.

7-year-olds Can manipulate phonemes, including adding, deleting, and moving

any phonemes to generate designated words.

92

Figure 2

Facets of the Unitary Validity

Test Interpretation Test Use

Construct Validity + Evidential Basis Construct Validity Relevance and Utility

Construct Validity +

Consequential Construct Validity + Value Relevance and Utility +

Basis Implications Value Implication + Social

Consequences

93

Figure 3

Plot of Eigenvalues and Factors of Scree Test

2.5

2.0

1.5

1.0

.5

0.0 Eigenvalue 1 2 3 4

Factor

94

Figure 4

The Procedure for Assessment Construction and Construct Validation

Assessment construction Aspect of validity Validation procedure

• Specifying cognitive outcomes Content aspect • Specifying domain of construct

/ taxonomy of objectives – previous research and

• Table of specification observation

• Developing assessment tasks – • Construct underrepresetation construction of items and construct irrelevancy

• Index of item congruence

• Developing answer keys Substantive • Administrating and scoring

• Developing scoring rubrics aspect considerations

• Developing models for scoring • Evaluating assessment

instruments – task reliability

• Summarizing measurement

data

• Gathering information about Structural aspect • Item and subscale item analysis intercorrelations

• Item analysis

• Factor analysis

• Item response theory

• Multitrait-multimethod matrix

Generalizability • Generalizability theory

• Meta-analysis 95

External aspect • Multitrait-multimethod matrix

• Group differentiation

• Correlations with other

measures

• Regression analysis

• Structural equation modeling

• Selecting items from the Consequential • Detecting item bias and fair information about item analysis aspect selection and item bias detection • Evaluating intended /

• Developing question / item file unintended consequences of

score interpretation and use

• Evaluating the impact of test

invalidity

96

APPENDIX: PHONOLOGICAL AWARENESS TEST (PAT)

Ceiling for all subtests: Stop the administration if all of the three practice items are

wrong, or when there are 3 consecutive wrong items. If child is

loosing track of the task, go back to the example to remind the

child of the task.

Name: ______

Date of Administration: ______

Examiner: ______

Summary of Results

Test Raw Score

Rhyming Discrimination

Sentence Segmentation

Syllable Segmentation

Initial Isolation

Syllable Blending

Phoneme Blending

Consonants Graphemes

Long & Short Vowels Graphemes

97

Rhyming Discrimination

“I am going to say two words and ask you if they rhyme. Listen carefully. Do these words rhyme? Fan – man.”

Stimulus phrase: “Do these words rhyme? _____ - _____ ”

Practice items: 1. Fan – man (yes), 2. Fan – tan (yes), 3. Fan – dog (no).

Examinee’s

Item Correct Response Response Score book – look Yes 1 0 fun – run Yes 1 0 ring – rat No 1 0 box - mess No 1 0 fish – dish Yes 1 0 mop – hop Yes 1 0 shoe – fan No 1 0 sweater - better Yes 1 0 camper - hamper Yes 1 0 pudding - table No 1 0

TOTAL SCORE

98

Sentence Segmentation

“I am going to say a sentence, and I want you to clap one time for each word I say. My house is big. Now, clap it with me.” Say the sentences again and clap once as you say each word. “My – house – is – big. Now, you try it by yourself. My house is big.”

Stimulus phrase: “Clap one time for each word I say. ______”

Practice items: 1. My – house – is – big. (4 claps) 2. My – name – is – _____. (4 claps)

3. I – like – dogs. (3 claps)

Correct Examinee’s

Item Response Response Score

He can swim 3 claps 1 0

My cat is black 4 claps 1 0

I am very tall 4 claps 1 0

My dad’s car won’t start 5 claps 1 0

That flower is pretty 4 claps 1 0

Some cows give milk 4 claps 1 0

The clown has big feet 5 claps 1 0

Let’s go to school 4 claps 1 0

I have ten books 4 claps 1 0

The kite is flying high 5 claps 1 0

TOTAL SCORE

99

Syllable Segmentation

“I am going to say a word, and I want you to clap one time for each word part or syllable

I say. Saturday. Now, clap it with me.” Say the word and clap once as you say each syllable. “Sat – ur – day. Now, you try it by yourself. Saturday.”

Stimulus phrase: “Clap one time for each syllable in the word _____.”

Practice items: 1. Sat – tur – day (3 claps) 2. Fri – day (2 claps) 3. Dog (1 clap)

Examinee’s

Item Correct Response Response Score

Pizza 2 claps 1 0 watermelon 4 claps 1 0

Fix 1 claps 1 0 calendar 3 claps 1 0 television 4 claps 1 0 moose 1 claps 1 0 elephant 3 claps 1 0 pillow 2 claps 1 0 kindergarten 4 claps 1 0 candy 2 claps 1 0

TOTAL SCORE

100

Initial Isolation

“I am going to say a word, and I want you to tell me the beginning or first sound in the word. What’s the beginning sound in the word CAT?”

Stimulus phrase: “What’s the beginning sound in the word _____?”

Practice items: 1. CAT /k/ 2. MAD /m/ 3. JANE /j/

Examinee’s

Item Correct Response Response Score

Bite /b/ 1 0

Toy /t/ 1 0 dinosaur /d/ 1 0 fudge /f/ 1 0

Nose /n/ 1 0

Apple /a/ 1 0 garage /g/ 1 0 happy /h/ 1 0

Chalk /ch/ 1 0

Laugh /l/ 1 0

TOTAL SCORE

101

Syllable Blending

“I’ll say the parts of a word. You guess what the word is. What word is this?” Pause for one second between syllables. “ta – ble” If the child repeats the word in parts, say “Say it faster, like this, table.”

Stimulus phrase: “What word is this? _____ .”

Practice items: 1. ta – ble (table) 2. mo – ther (mother) 3. he – llo (hello)

Examinee’s

Item Correct Response Response Score win - dow window 1 0 flow – er flower 1 0 can – dy candy 1 0 com – pu – ter computer 1 0 moun - tain mountain 1 0 bas – ket basket 1 0 tel – e – phone telephone 1 0 croc – o – dile crocodile 1 0 dic – tion – ar – y dictionary 1 0 con – ver – ti – ble convertible 1 0

TOTAL SCORE

102

Phoneme Blending

“I’ll say the sound. You guess what the word is. What word is this?” Pause for one second between syllables. “p – o – p” If the child repeats the word by sounds, say, “Say it faster, like this, pop.”

Stimulus phrase: “What word is this? _____ .”

Practice items: 1. p – o – p (pop) 2. d – o – g (dog) 3. c – a – t (cat)

Examinee’s

Item Correct Response Response Score

/b – oi/ boy 1 0

/n – ç/ knee 1 0

/p – ö/ paw 1 0

/s – i – t/ sit 1 0

/f – l – î/ fly 1 0

/m – ou – s/ mouse 1 0

/k – î – n – d/ kind 1 0

/s – n – a – p/ snap 1 0

/m – i – l – k/ milk 1 0

/s – l – i – p – çr/ slipper 1 0

TOTAL SCORE

103

Consonants Graphemes – Discontinue is child gets 8 consecutive letters wrong, and

does not know those in his or her name.

“I’m going to show you some letters. I want you to tell me what sound each letter makes.”

Stimulus phrase: “Tell me what sound this makes.”

Note: If the student gives one correct sound of /c, g, s/, prompt for the other sound by asking, “What’s another sound this makes?” If the student is able to provide one correct sound, score the item as correct.

Use the graphemes booklet for this subtest.

Correct Examinee’s Correct Examinee’s

Item Response Response Score Item Response Response Score b /b/ 1 0 n /n/ 1 0 c /k, s/ 1 0 p /p/ 1 0 d /d/ 1 0 q /k, kw/ 1 0 f /f/ 1 0 r /r/ 1 0 g /g, j/ 1 0 s /s, z/ 1 0 h /h/ 1 0 t /t/ 1 0 j /j/ 1 0 v /v/ 1 0 k /k/ 1 0 w /w/ 1 0 l /l/ 1 0 x /eks, z, ks/ 1 0 m /m/ 1 0 z /z/ 1 0

TOTAL SCORE 104

Long & Short Vowels Graphemes

Use the same vowel card to elicit both the short and long vowel sounds below. If necessary, prompt with “Now, tell me the other sound this letter makes.”

Note: Use the vowel sounds booklet for this subtest.

Examinee’s

Item Correct Response Response Score

A /a/ as in bat 1 0

A /â/ as in cake 1 0

E /e/ as in met 1 0

E /ç/ as in me 1 0

I /i/ as in sit 1 0

I /î/ as in high 1 0

O /o/ as in top 1 0

O /ô/ as in over 1 0

U /u/ as in but 1 0

U /û/ as in use or tool 1 0

TOTAL SCORE