<<

CREATION OF A MORE ACCURATE AND PREDICTIVE TRAIL MAKING TEST

Brian T. Smith

A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Arts

Department of

University of North Carolina Wilmington

2011

Approved by

Advisory Committee

Jeffrey Toth Alissa Dark-Freudeman

Karen Daniels

Accepted by

Dean, Graduate School

TABLE OF CONTENTS

ABSTRACT ...... iii ACKNOWLEDGEMENTS ...... iv LIST OF TABLES ...... v INTRODUCTION ...... 1 About the Trail Making Test ...... 2

TMT as a Predictor of MCI and AD ...... 4

Limitations of the TMT ...... 5

Rationale for the Study ...... 7

Summary of Hypotheses ...... 13

METHOD ...... 13 Participants ...... 13

Materials ...... 14

Procedure ...... 18

RESULTS ...... 19 Performance on Individual Tasks ...... 19

Task Reliability ...... 23

Correlations Among Trails Tasks ...... 25

Correlations of Trails to Criterion Measures ...... 25

DISCUSSION ...... 26 CONCLUSIONS ...... 31 REFERENCES ...... 33

ii

ABSTRACT

The goal of this research was to create and evaluate a computerized touch-screen version of the popular Trail Making Test (TMT). The TMT is a pen-and-paper test that has been used for decades to measure individual differences in executive functioning and to help identify cognitive deficits associated with . Our computerized variant, called eTrails, was aimed at addressing some of the limitations of the original TMT and improving both its reliability and predictive accuracy. Two additional eTrails variants were also created that manipulated aspects of the task thought to drive its predictability; namely, the ability to block out distraction (eTrails

Flash) and visual search ability (eTrails Scramble). All variants of eTrails demonstrated increased reliability relative to the TMT and most of the eTrails variants showed strong inter-task correlations; however, relationships between eTrails and well-known measures of executive functioning were generally insignificant. Potential explanations for the failure to find increased predictive power for this more reliable TMT variant are discussed.

iii

ACKNOWLEDGEMENTS

I would like to thank Dr. Karen Daniels for her mentorship and constant guidance throughout my graduate career. You have been an excellent role model as a scientist and a person. To my committee members, Dr. Jeffrey Toth and Dr. Alissa Dark-Freudeman, thank you for your invaluable assistance and suggestions. I would like to especially thank Dr. Toth for his contributions to designing and coding eTrails. I could not have accomplished what I have without the three of you.

Finally I would like to thank my parents Thomas and Katherine Smith for instilling the love of science and learning in me. Your love and support through the years have made me the man I am today.

iv

LIST OF TABLES

Table Page

1. Trail Making Test Scores ...... 38

2a. Form ―A‖ Statistics for eTrails ...... 39

2b. Form ―B‖ Statistics for eTrails ...... 39

3. Divided Attention Task Results...... 40

4. Ospan Results...... 42

5. Stroop Results ...... 43

6a. Reliability for Trails ...... 43

6b. Reliability for Criterion Measures ...... 43

7. Correlations Between Trails Tasks ...... 44

8. Correlation Between Trails Tasks and Criterion Measures ...... 45

9. Correlations Between Trails Tasks Using Subtraction Scores ...... 46

10. Correlations Between Trails Tasks and Criterion Measures Using

Subtraction Scores ...... 47

v

INTRODUCTION

Old age can be a time of fulfillment and enjoyment; retirement frees up time to engage in leisure activities made impossible by careers and more time can be spent with family and friends. Unfortunately, many older adults never have the opportunity to enjoy the benefits of late life because it is also a time of increased vulnerability to a number of severe disorders. One of the most prevalent and debilitating age-related disorders is

Alzheimer‘s Disease.

Alzheimer's Disease (AD) is a "degenerative brain disorder in which neurons, the specialized cells of the brain that process information, stop functioning properly" (Caroli

& Frisoni, 2009, p. 570). The 2011 Facts and Figures report from the Alzheimer‘s

Association stated that approximately 5.4 million people in the United States are currently living with AD, making it the sixth leading cause of death for Americans. The incidence of AD rises sharply with age with only 2% to 5% of 65 year olds showing signs of AD, but 25% to 50% of those 85 and older showing symptoms. More alarming, the report revealed that, while most major causes of death (e.g., heart disease, many cancers, stroke, and HIV/AIDS) are on the decline, AD has increased 66% in recent years because there is no known cure and no clear method for prevention of the disease. As a result, intervention for AD has tended to focus on early detection. Diagnosing AD early has many benefits including enhanced medical care, preparation for the lifestyle changes that must accompany eventual cognitive decline, and allowing for interventions that slow cognitive decline at the earliest possible stage (Caroli & Frisoni, 2009). Unfortunately,

detecting AD and providing a clear diagnosis can be very difficult in the early stages of the disorder.

One approach to early detection of AD is identifying cognitive precursors of the disease in pre-clinical individuals (Balota, Tse, Hutchison, Spieler, & Morris, 2010). Mild cognitive impairment (MCI) is defined broadly as cognitive decline that is greater than would normally be expected for an individual at a certain age or education level without affecting their daily activities in notable ways (Gauthier, Reisberg, Zaudig, & Petersen, et. al, 2006). MCI typically presents itself as minor memory lapses (amnesic-MCI) with normal thinking and reasoning skills. Evidence suggests that the brains of individuals who suffer from MCI are neurobiologicaly different from individuals without such cognitive impairment, and that the changes in the brains of MCI patients are similar to those who suffer from AD but on a less severe scale (Harotunian, Hoffman, & Beeri,

2009). These findings suggest that MCI is likely associated with the early stages of AD and that being able to effectively identify individuals with MCI might serve as an ―early warning system‖ for dementia.

While there are a number of neuropsychological measures known to be sensitive to MCI, a simple paper-and-pencil measure known as the Trail Making Test (TMT) has proven to be one of the most widely used and sensitive tests to the onset of the disease

(Blacker, Lee, Muzikansky, Martin, Tanzi, & McArdle, 2007; Chen, Ratcliff, Belle,

Cauley, DeKosky, & Ganguli, 2001; Johnson, Lui, & Yaffe, 2007; Storandt, 2008).

About the Trail Making Test

The Trail Making Test (TMT) is an, "efficient and sensitive instrument that is easily administered, and which reliably discriminates between normal individuals and

2

those with brain impairment" (Arbuthnott & Frank, 2000, p. 312). It is considered to be one of the best measures of general brain functioning (Reitan & Wolfson, 1985;

Mitrushina, Boone, & D‘Elia, 1999). The TMT was created in 1938, was originally called

―Partington‘s Pathways‖, and was included in the Army Individual Test Battery as well as the Halsted-Reitan Battery. The TMT is a two-part pen-and- paper test that is believed to measure visual-motor functioning, symbol recognition, the ability to scan a page, the flexible integration of numerical and alphabetical information under time pressure, as well as such as sequencing and mental flexibility (Reitan & Wolfson, 1985).

The original TMT included two versions, A and B. In Trails A, individuals are given a sheet of paper containing circles with the numbers 1 to 25 organized randomly on the page and are asked to rapidly connect the numbers in sequential order. Trails B, shown in Appendix A, also requires individual to connect 25 target circles, but this time alternating between letters and numbers in alphabetical order (i.e., circling 1, then A, then

2, then B etc.). Trails B is thought to measure a more complex set of cognitive abilities that include planning, sequencing, updating working memory, and shifting between two stimulus domains (Arbuthnott & Frank, 2000; Lezak, Howieson, & Loring, 2004;

Strauss, Sherman, & Spreen, 2006). It is important to note that these are all executive abilities that are often found to decline in older adults (Gaudino, 1995) and they are also among the first to show decrements as a function of MCI (Reitan, 1985). Trails B is one of the few neuropsychological measures that is able to differentiate between dementia patients and control subjects (Cahn, Salmon, Butters, Wiederholt, Corey-Bloom,

Edelstein, & Barrett-Connor, 1995).

3

Trails A is generally treated as a baseline condition where response latency is believed to reflect simple reaction time (RT). Unlike Trails B, successful performance on

Trails A has been shown to rely very little on executive abilities (Arbuthnott & Frank,

2000). By comparing an individual‘s performance on the A (non-executive) and B

(executive) versions, one is able to generate two critical measures of the individual‘s cognitive/executive capacity: the difference between Trails A and Trails B (how much longer it took them to complete Trails B) and the B/A ratio. Slowed performance on

Trails B relative to Trails A is used as an indication of cognitive impairment or a general frontal lobe dysfunction. There have also been attempts to create normative reaction times on Trails B; a time of less than 72 seconds is considered normal performance, 73-

105 seconds is considered mild impairment, and anything beyond 106 seconds is considered serious impairment. Still, the most commonly used measure remains the difference score between Trails A and Trails B because it is the most general and conservative, and therefore difference scores will comprise the primary dependent variable for the current study.

TMT as a Predictor of MCI and AD

Prior research has also illustrated the validity of the TMT as a predictor of the cognitive deficits associated with MCI; this includes the ability to maintain focus on a goal despite distractions as well as the ability to alternate attention between two different goals (Arbuthnott & Frank, 2000). More recent studies have further demonstrated the predictive power of the TMT in diagnosing MCI by extending these findings to the early stages of AD where afflicted individuals show both longer reaction times and increased error rates compared to healthy individuals (Ashendorf, Jefferson, O'Connor, Chaisson,

4

Green, & Stern, 2008). A final source of evidence for the utility of the TMT comes from the neuroscience literature. A recent fMRI study, for example, found that TMT performance was associated with significant increases in blood flow to the prefrontal cortex, the brain region known to underlie many executive abilities found to decline with

MCI and AD (Kubo, Shoshi, Kitawaki, Takemoto, Kinugasa, Yoshida, Honda, &

Okamoto, 2008).

However, while a general relationship between the TMT and these frontal/executive regions is evident, it is still not clear exactly which executive abilities are being taxed. Given that the prefrontal cortex has been linked to a broad array of higher-order abilities, it is not clear whether the deficits in TMT performance and the corresponding increase in blood flow to frontal regions are due to visual search, blocking out distractions, planning, etc. Prior TMT research provides conflicting views regarding which factors drive performance (Cubillo, 2009; Artbutnott & Frank, 2000) and this controversy provides one of the motivations for our creation of a computerized version of the TMT.

Limitations of the TMT

Although the Trail Making Test is one of the most commonly used tests for diagnosing MCI, its utility is limited by several factors having to do with test design and administration. A number of these limitations are created by its reliance on a pen-and- paper format. First, repeated use of the test—critical for detecting the types of within- individual changes associated with the early stages of AD—is highly limited with current pen-and-paper versions (Salthouse, Toth, Daniels, et al., 2000). Most notably, there are only two alternate forms of Trails A and B (all testing is done with same two

5

arrangements of circles). The lack of alternate forms makes it difficult to experimentally investigate the relevant cognitive processes underlying task performance. This can be attributed to the fact that these various factors that may be driving performance (e.g., circle arrangement, distance between circles, etc.) cannot be systematically manipulated.

Secondly, research has shown that there are significant practice effects with repeated administration of existing forms (e.g., Buck, Atkinson, & Ryan, 2008); these practice effects are evident after only two exposures in the same day (Franzen, 1996) and can be detectable up to one year after initial administration (Basso, Bornstein, & Lang, 1999).

Such practice effects represent one of the most pervasive problems when utilizing within- subject research designs. Any improvements observed during the second administration of the test may simply be due to their prior exposure to it, making it both difficult to interpret performance gains as well as to establish reliability. Measures of executive function often show a high level of practice effects because they typically present the subjects with novel situations in which they must solve a problem or recognize an abstract concept. After the first administration of the test, they know all of the "tricks"; the novelty wears off quickly and they are able to refine their strategies thereby improving test scores. Trails-B is even more affected by practice effects than Trails-A because of the increased novelty associated with doing this particular task (Franzen,

1996). These findings substantially undermine the diagnostic sensitivity of the TMT, especially in cases of AD where intra-individual changes in performance are thought to be one of the most sensitive indicators of abnormal cognitive decline.

A final notable limitation of the TMT is linked to its requirement for individuals to manually connect the circles with a pencil line or "trail". This requirement adds time

6

and variability to performance due to a number of factors unrelated to the cognitive processes of interest (i.e., those affected in early AD)—factors such as handedness, arthritis, and general dexterity. In addition, reliability scores can vary greatly based on administrative errors. One such error is the failure of the examiner to correctly return the subject‘s pencil to the place from which they began drawing the incorrect trail. It is also common for the subject to not fully understand the directions of the test before beginning

(Arbuthnott & Frank, 2000). Individuals who are instructed thoroughly and who are given sufficient practice prior to beginning the actually task demonstrate a significant time advantage over those who are not. These limitations complicate interpretation of

TMT performance. When impaired performance is observed on the TMT, is it not clear whether such impairments reflect a true cognitive deficit or more superficial problems related to task administration or format. The current research aims to standardize the procedures involved in Trails administration with the goal of increasing its diagnostic power.

Rationale for the Study

With the above issues in mind, Dr. Jeffrey Toth and I created a computerized version of the TMT called "eTrails" that uses touch-screen technology. This computerized task embodies the same general methods and principles as the original pen-and-paper version with the key exception that, rather than connecting circles with a pen on paper, participants touch targets arranged on a computerized display in the specified order. One goal of eTrails is to try to address some of the limitations in the existing version of the

TMT described above. First, it will allow the researchers to change the location of the targets on the screen (the letters and numbers), thereby substantially reducing the problem

7

of practice effects and opening up the possibility of multiple testing sessions for the same individual. As stated earlier, the TMT only has two different forms; eTrails currently has over 30. As noted by Buck, Atkinson, and Ryan (2008), the most effective way to determine whether an individual's change in performance from one testing session to the next is meaningful is by conducting "test-retest score difference using alternate and theoretically equivalent forms" (p. 312). A computerized version of the TMT that provides us with a number of different, but equivalent, forms would allow us to examine such test-retest reliability. Finally, comparing the original pen-and-paper version of the

TMT with eTrails in the current study will provide a direct test of the effects of test format (paper vs. computer) on TMT performance and may introduce computerized

(touch screen) testing as an easier and more reliable method of responding that overcomes limitations related to the dexterity of the subject. This ease of testing also potentially allows for the inclusion of more varied research and clinical populations.

It should be noted that computerized versions of the TMT have been previously attempted (Drapeau, Bastien-Toniazzo, Rous, & Carlier 2007; Kubo, 2009). However, the current study differs from this earlier research in two important ways: First, the earlier studies were simply direct replications of the TMT and did not fully take advantage of the change in format. The current study is designed to go beyond these direct replications and to use computerization to address some of the aforementioned limitations of the pen-and- paper format. A second, exciting difference regarding the computerized variants is that they will allow us to make systematic changes to the task with the ultimate goal of further improving its predictive power. These changes will be discussed in the following sections. eTrails‘ computerized format (1) is millisecond-accurate and thus can detect

8

much smaller differences in performance across individuals; (2) will introduce the possibility of getting additional measures of performance that go beyond those derived from the TMT and which may be related to early AD (e.g., time to first touch, average touch time, time between touches as a function of stimulus class, etc.); and (3) will permit various aspects of the task to be experimentally manipulated with the goal of increasing the cognitive, or executive, control needed to perform the task and thus will allow the effects of these changes to be directly assessed. The specific manipulations performed are discussed in the next sections. eTrails-Standard

The first computerized TMT variant, eTrails-Standard, is similar in structure and procedure to the original TMT with the exception that different arrangements of numbers and letters are afforded by the computerized format. The goal of eTrails-Standard is to keep the critical aspects of the task as close to the original TMT as possible such that any observed performance differences between the two are attributable to the format change. If eTrails-Standard is found to not correlate with the original TMT then it would suggest that the pen-and-paper format and/or the fixed configuration of targets may have contributed to the validity of the original task. This version of eTrails is expected to show a moderate correlation with the paper-and-pencil

TMT indicating that they are tapping into the same executive abilities. It should also have higher correlations with other executive measures compared to the TMT given that it is expected to be less hindered by the structural and procedural limitations mentioned above.

9

eTrails-Flash: Random and Next

The second and third eTrails variants involve attentional capture. Capture occurs when an aspect of one‘s environment (e.g., a horn, a flashing light, etc.) automatically draws one‘s attention, sometimes despite the intention to ignore it. There is increasing evidence showing that loss of attentional control occurs as a function of healthy aging

(Jacoby, Bishara, Hessels, & Toth, 2005) and that these deficits are particularly pronounced even in the early stages of AD (Castel, 2009). Older adults with dementia seem particularly vulnerable to what Daniels, Toth, and Jacoby (2006) call "goal neglect" in which a distractor can derail their attention from the task at hand. In the current study, capture was increased relative to the original TMT by making one of the square targets on the computer screen flash briefly (quickly change from red to white and back again). The first version of this task, referred to as ―FlashRandom‖, involves having a random incorrect square flash white briefly as participants are trying to respond to each target

(i.e., the flashed number is not predictably related to the target response). The expectation is that, when an individual is searching for the next letter/number in the sequence, the flashing of an incorrect letter/number will draw attention away from the task goal. Thus, to avoid clicking on the flashing incorrect target, the individual must exert executive control; this should further distinguish those who may be more prone to the increased capture typical of dementia.

The second capture variant, ―FlashNext‖, is similar to the ―FlashRandom‖ variant described above in that it involves having a random incorrect square flash; however, for this version, the next target square in the sequence will flash. For example,

10

as a participant selected the ―2‖ target square in the Trails A task, the ―3‖ square would flash. In this case, the flash provides the participant with the correct answer and, unlike the ―FlashRandom‖ variant, may actually facilitate Trails performance. This change may be diagnostic as well: participants who struggle on this ―FlashNext‖ variant may demonstrate the most significant deficits on other tasks. If they react slowly in a facilitating, predictive environment, they will undoubtedly be slow under conditions of distraction. eTrails-Scramble

In the final eTrails variant, a correct button press on any trial will result in all of the other, non-target labels switching positions. The location of the buttons themselves stay in the same physical space on the screen, but the labels of the buttons (the numbers 2 through 16) randomly switch positions with one another (the ―2‖ may move to where the

―4‖ once was, the ―4‖ may move to where the ―16‖ previously was, etc.). This scrambling is intended to remove the ability of participants to plan their next selection in advance by visually scanning the fixed arrangement of squares. Visual search is considered by many to fall under the umbrella of executive functioning (e.g., Kubo et al., 2007) and this ability to select targets from a visual display is one that appears to decline in AD (Castel,

Balota, & McCabe, 2009). By preventing ―look ahead‖ with this scramble variant, it is expected that participants will require more cognitive effort to search for the next target, placing pressure on those suffering from executive declines.

Dividing Attention as a Proxy for Executive Deficits

As stated above, one of the goals of this study is to improve upon the predictive strength of TMT, especially as it relates to early diagnosis of the kinds of cognitive

11

deficits found in MCI and Alzheimer‘s disease. Unfortunately, patient populations are difficult to access and are impractical for the purposes of establishing the validity and reliability of our eTrails variants. Thus, it became necessary to find a way to mimic cognitive deficits in the laboratory as a proxy for testing patients. We accomplished this by using a divided attention (DA) paradigm, also known as the secondary task technique and the dual-task technique (Posner & Boles, 1971). Divided attention manipulations require an individual to allocate some their attentional resources to a simultaneous secondary task, preventing them from fully attending to the primary task. Research consistently shows that dividing attention (DA) results in significantly poorer performance on the primary task relative to full attention (FA) consistent with the idea that DA uses up attentional resources (Anderson, Craik, & Naveh-Benjamin, 1998).

There is evidence to suggest that older adults have fewer resources to devote to a task when compared to younger adults and thus these older adults often show greater costs of dividing attention (for a review, see Verhaegen, Steitz, Sliwinski, & Cerella,

2003). Indeed, requiring young adults to perform under conditions of divided attention tends to result in memory performance very similar to older adults who are performing with full attention (Skinner, & Fernandes, 2009). Research also suggests that there is a marked impairment in the ability of Alzheimer‘s patients to co-ordinate the performance of two simultaneous tasks and that AD may result in a severe dual-processing deficit not observed to the same degree in normal aging (Baddeley, Baddeley, Bucks, & Wilcock,

2001). The current study employs dual-task costs, or the negative change in performance for younger adults under divided attention compared with full attention, as a way to

12

mimic cognitive decline. Those younger adults who suffer greatly under divided attention provide a suitable analog for older adults with MCI.

Summary of Hypotheses

Participants completed the A and B versions of the original TMT, along with four computerized variants (eTrails-Standard, eTrails-FlashNext, eTrails-FlashRandom, and eTrails-Scramble). Participants were also given a brief battery of memory, attention, and cognitive speed measures. For the divided attention task, the difference between the two conditions (full attention performance minus divided attention performance) was used as a proxy for the kinds of cognitive decline observed in AD. The main hypotheses under investigation were that (1) eTrails-Standard would demonstrate stronger correlations with the various executive criterion measures relative to the pen-and-paper TMT which is hindered by the above limitations; and (2) the eTrails-Scramble and eTrails-FlashRandom variants would better predict than eTrails-Standard because they are intended to directly tax those executive processes believed to drive TMT performance; any differences between them would inform the relative importance of capture and visual search to the

TMT. Moreover, the use of the divided attention manipulation would provide evidence of the predictive accuracy of these various measures along a spectrum of cognitive ability.

METHOD

Participants

The sample for the current study consisted of 43 young adults (22 females, 21 males). All were UNCW undergraduates who voluntarily signed up through the

Psychology Department‘s research system. Two subjects‘ results were excluded due to

13

large error rates on the TMT making their data impossible to score. The data from the forty-one remaining subjects were statistically analyzed .

Materials

The main experimental tool for this study was ―eTrails 1.0‖. The program was built using Microsoft Visual Basic version 6.0. The program consists of a 576 x 792 pixel window on which 16 50x50 pixel buttons are presented. Unlike the pen-and-paper TMT that has only 2 different forms, eTrails utilizes 32 different forms in the current study (16 practice forms and 16 full forms). The practice forms each contain 6 squares, while the full length forms contain 16. Each form was created by dividing the computer screen into

4 sections, and then again into 4 subsections (Appendix B). A random dice software was used and based on two dice rolls the squares' placement were assigned to that area (in the appendix, green boxes would be the first subsection, the blue would be the second, black would be 3, and orange would be 4). For example, if a ‗1‘ and then a ‗4‘ were rolled the square would be placed somewhere in the 4th box in the 1st subsection. Slight manual adjustments were made to the positioning assigned by the program only under the following conditions: (a) squares were overlapping or too near to one another (closer than

50 pixels); (b) two sequential numbers were placed directly next to one another; or (c) the resultant patterns of placement were too easy (all odd numbers placed on the top and all even numbers on the bottom). This process was done for each and every configuration such that no two layouts ended up the same.

When the eTrails program is initiated there is a prompt to enter an identification number, the participant‘s age, and the participant‘s gender. Once all the values are assigned, the participant presses a ―begin‖ button and the test initiates. Before each full

14

round, the program includes a 6-button practice session. The program records the response time both between buttons and for the entire trial as well as the total number of errors. eTrails was designed for use on touch screen computers and all variants were administered using ASUS ―EeeTop PC ET1602‖. In addition to eTrails and the original pen-and-paper TMT described above, several cognitive tests were used as outcome measures to assess the degree to which these Trails variants can successfully predict various forms of higher-order cognition.

Operation Span (Ospan)

Ospan is a common measure of working memory capacity (Conway, Kane,

Bunting, Hambrick, Wilhelm, & Engle, 2005; Turner & Engle, 1989). Working memory is a limited capacity system involved in the storage and manipulation of information in the service of complex goals (Baddeley & Hitch, 1974; Engle, 2002). Working memory tasks have been shown to be highly predictive of a performance on a variety of laboratory and real-world tasks (Conway et al., 2005). Importantly, performance on span tasks shows marked impairment even in the early stages of Alzheimer‘s Disease (Kensinger,

Shearer, Locascio, Growdon, & Corkin 2003; Rosen, Bergeson, Putnam, Harwell, &

Sunderland, 2002). While performing Ospan, participants were asked to recall a list of words while simultaneously solving simple math equations. On each equation-word trial, participants were given a math problem followed by a letter (i.e. Is (6x2) -5 = 7? Q). The participants were then instructed to read the math equation aloud and to respond ―yes‖ or

"no‖ to its correctness, and finally to read the letter aloud and try to remember it for a later test. After some number of these equation-word trials, a recall cue appeared on the screen (i.e.‖???‖) and participants attempted to recall as many letters as possible in the

15

order in which they were presented by writing their responses on an answer sheet. Set sizes ranged from two to five words across a total of 12 sets. An individual‘s span score was determined by the number of words they correctly recalled in order.

Color-Word Binary Stroop

The Stroop task is considered the ―gold standard‖ measure of attentional control

(MacLeod, 1992). In this task, participants were presented with a color-word (either the word ―red‖ or the word ―blue‖) presented in a colored font (e.g., the word RED presented in blue font; the word BLUE presented in blue font) and were instructed to name the color of the font (―blue‖ in both cases) as quickly as possible. Participants were told to only respond to the color of the font, and not the word itself. A Labtec AM-22 microphone was used to record reaction time. The experimenter recorded the accuracy of responses by pressing the "1" key when the participant responded "red", the "2" key for

"blue", and the "3" key for discarded trials. Discarded trials included partial responses

("bl-red"), stutters (r…r…red), and extraneous noises and movements that inadvertently triggered the microphone (e.g., coughing, exhaling, etc.).

Participants completed 155 total trials, 95 congruent (where the color and word matched), 30 incongruent (where the color and word were different), and 30 neutral trials

(which involved ampersands in different colors). Stroop is a good measure given the goals of the current study, because increases in interference scores in the Stroop task are regularly observed for Alzheimer‘s patients (Bondi, Serody, Chan, Eberson-Shumate,

Delis, Hansen, & Salmon, 2002; Spieler, Balota, & Faust, 1996) and Stroop variants with a large proportion of congruent trials have been shown to be particularly taxing to executive processes like working memory (Kane & Engle, 2003).

16

Recognition Task with Full vs. Divided Attention.

Recognition is a measure of long-term episodic memory. For this study we used a recognition procedure similar to that of Jacoby, Toth, and Yonelinas (1993). Participants were shown five-letter nouns, one at a time, on the computer screen and were asked to read them aloud and to remember them for a later test. During the divided-attention portion of the task, participants were further instructed that they would also be performing a listening task (Craik, 1982). The listening task involved the participants monitoring a computer-recorded audio file in which a list of numbers was read aloud at a

2-second pace. The participants were instructed to respond ―now‖ every time they heard the target sequence, three odd digits in a row. The list of numbers conformed to two rules: (1) Each target sequence must have at least two even numbers between it and the next target sequence, and (2) the interval between target sequences must vary in length in order to be unpredictable (this interval ranged between two to six numbers). During the test phase participants were again given five-letter words. Some of these words were previously on the study list, others were new words not encountered previously in the task. The participant simply needed to discriminate between these old and new items by pressing the ―1‖ key if an item was on the previous list and ―2‖ if the word is a new word.

Other Tests

Two final pen-and-paper tests were used to ensure that our young sample was relatively representative of their age group. One was the Shipley Test which is a common pen-and-paper test of vocabulary where participants are shown 40 increasingly challenging words (e.g., ―talk‖ is an early word and ―querulous‖ is a later one) and are

17

asked to choose from among four alternatives which word is most similar in meaning to the target. The other was Letter Comparison which is a computerized measure of speed of cognitive processing where participants are asked to quickly compare two strings of letters and to indicate using a key-press whether they are the same (by pressing the ―S‖ key) or different (―D‖ key).

Procedure

When participants arrived in the laboratory, they were directed to a separate testing room. They completed consent and demographics forms, and were given a general introduction to the design and goals of the study. They were then given the Trails variants in the following order: paper-and-pencil TMT, eTrails-Standard, eTrails-FlashNext, eTrails-Scramble, and eTrails-FlashRandom. For each, they received the A version directly followed by the B version. Following completion of all the Trails tasks (which took approximately 30 minutes), participants completed the other cognitive measures in the following order: Recognition Full Attention, Recognition Divided Attention, Ospan,

[5-10 minute break], and Stroop. For the purposes of establishing reliability, participants then completed a second administration of each version of Trails. Note that different layouts of letters and numbers were used for the second administration of each task to reduce practice effects. Lastly, participants completed Letter Comparison, and Shipley.

The entire session took approximately one and a half hours. Participants were given specific instructions prior to each task and were given the opportunity to ask clarification questions. Upon completion of all of the tasks, the participants were debriefed, thanked for their participation, and escorted out of the laboratory.

18

RESULTS

Performance on Individual Tasks

The Trail Making Test

The reaction time (RT) data for the paper-and-pencil TMT are summarized in

Table 1. A 2 x 2 repeated measures ANOVA was conducted with Test Type (Trails A vs.

B) and Time of Administration (Time 1 vs. 2) as the within-subjects factors. The effects of Test Type (F (1,160) = 12.56, p = < .005) as well as Time of Administration

(F (1,160 ) = 161, p < .001) were significant. The interaction did approach, but not achieve, significance (F (1,160) = 3.44, p= .065). The significant effect of Test Type shows that the ―B‖ versions of the task took significantly longer than the ―A‖ versions.

The significance of Time of Administration suggests that there were significant practice effects with consistently slower RTs for the first administration of each task. Note also that there was considerable variability in performance for the B forms relative to the A forms as indexed by the larger range of scores across participants (SD = 17,046ms for the first administration).

Focusing on the error rates for the TMT, only four of the 41 subjects made no errors on the TMT (9.75%). Participants made an average of .89 errors per condition

(approximately 3.56 per person). A 2x2 ANOVA with Task Type and Time of

Administration as the factors was conducted on the error data. There were no significant main effects or interactions. The most common error was not fully connecting the line to the target circle. The subjects intended target was usually clear, but the pen line would not cross the boundary of the circle. In addition to reducing their accuracy scores, this

19

error likely also impacted participants‘ reaction times, because it makes the overall path each subject takes shorter than it should be and therefore potentially shortens their RT. eTrails

The eTrails reaction time data for the ―A‖ form can be found in Table 2a (top panel) and the corresponding data for the ―B‖ form can be found in Table 2b (bottom panel) [A1 indicates the first administration of each task and A2 the second administration]. A oneway ANOVA comparing the ―A‖ forms revealed significantly different RTs across the eTrails variants (F (7,320) = 59.15, p < .05). The Scramble variant produced the longest average RT (19,508ms) and the FlashNext variant produced the fastest RTs (11,789ms).

Like the TMT, the ―B‖ version of the eTrails produced slower reaction times than their ―A‖ form counterparts. Again, Scramble took the longest to complete and

FlashNext was the fastest (24,668ms and 12,613ms respectively). A oneway ANOVA comparing the "B" forms found that the RTs differed significantly across the variants (F

(7,305) = 74.84, p < .05). Practice effects were also evident in this data set as the second administration took less time to complete than the first administration, as can be seen by looking at the patterns present in Table 2b.

Turning to error rates for eTrails, six of the 41 participants made no errors on the eTrails variants (14.63%). On average, participants made an average of .23 errors per condition. These small error rates compared with the TMT (which you may recall produced the corresponding values of 9.75% and .89) are particularly impressive when one considers that each subject only completed four TMT forms but completed sixteen eTrails forms, giving them much more opportunity for error on eTrails. These

20

improvements in the error rate become even more evident when one directly compares the TMT and eTrails-Standard. Twenty of the forty-one subjects never made an error on eTrails-Standard (48.78%).

Divided Attention

As shown in Table 3, average corrected accuracy (hits – false alarms) for the full attention condition was .73 (SD = .16) and for the divided attention condition was .36

(SD = .18). Divided attention performance was significantly impaired relative to full attention performance, t (40) = -14.32, p = < .001. As an index of the cost of dividing attention for each individual, participants‘ DA score was subtracted from their FA score.

The average divided attention cost was .38 (with a range of .65 with the lowest performing individual scoring .13 and the highest performance individual scoring .68).

These results both support that the odd-numbers task was successful in dividing the attention of our subjects and that it produced a good range of cognitive ability which is critical in the current study for correlational analyses.

Ospan

Operation Span was analyzed in two ways (see Table 4). The first—referred to as the relative score—involved adding up the total number of correct words the participant recalled. The average relative score was 26.1 (SD = 5.2) and ranged from 16 to 36. The second measure, called the absolute score, is calculated by counting correctly recalled words only for trials where the participant successfully recalled all the words correctly.

The average absolute score was 13.4 (SD = 6.19) with a range in scores from 4 to 26. A rule-of-thumb for understanding performance on Operation Span is by looking at the general distribution of performance; for this approach, those individuals having an

21

absolute score of 9 or lower are considered ―low spans‖, those with a score between 10 and 18 are considered ―mid spans‖, and those scoring 19 or over are considered ―high spans‖. In the current study, then, there were 13 low spans, 18 mid spans and 10 high spans, again demonstrating a relatively normal and variable distribution of performance on this task.

Stroop

The results for congruent, incongruent, and neutral trials are displayed in Table 5.

The average reaction time for the congruent trials was 564ms (SD = 84ms), for the incongruent trials was 704ms (SD = 130ms), and for the neutral trials was 600ms (SD =

99ms). The omnibus ANOVA and all post-hoc comparisons were significant (F (2, 80) =

100.41, p < .001), showing that they were all significantly different from one another.

Facilitation scores were calculated by subtracting neutral performance from congruent performance (M = 37ms, SD = 47.21) and interference was calculated by subtracting neutral trials from incongruent trials (M = 103ms, SD = 73.90). The Stroop Effect, an index of the degree to which an individual was influenced by the to-be-ignored word, was calculated by subtracting the congruent score from the incongruent score for each person.

The average Stroop effect was found to be 140ms (SD=73) and is consistent with previous literature on the Stroop effect found in young adults (Spieler, Balota, & Faust,

1996).

Letter Comparison and Shipley

The average reaction time for the letter comparison task was 2,218ms

(SD=472ms). Because this is a recently computerized version of a traditionally pen-and- paper task, there is no clear standard of comparison for performance on this measure. It

22

will, however, help to elucidate age differences in cognitive speed in follow-up studies of eTrails that include older participants. The average number of correct responses for the

Shipley vocabulary measure was 28 out of 40 in the current study (SD=3.77). These scores are comparable, but slightly lower than those typically observed for young adults on this task in the literature which are often in the low- to mid-thirties (e.g., Kemper &

Sumner, 2001; Spieler & Balota, 2000)

Task Reliability

Data Trimming

For all tasks, scores that were more than two standard deviations above or below the mean for each individual were removed from the data set and were not included in statistical analyses. This resulted in 19 individual data points – less than 1 per individual

– being deleted for the entire study.

Test-Retest Reliability

For the Trails tasks, test-retest reliability was calculated by correlating scores on the first and second administrations of the task. As seen in Table 6a, the Trail Making

Task achieved low, but significant test-retest reliability, r = .37 (n = 37). Each version of eTrails had considerably higher reliability estimates than its paper-and-pencil counterpart: eTrails-Standard had the highest reliability with an r value of .62 (n = 37) followed closely by eTrails-FlashNext (r = .56; n = 38), eTrails-Scramble (r = .55; n =

36), and eTrails-FlashRandom (r = .58, n = 38). All of the individual correlations reached significance (p < .05); however only the TMT and eTrails-Standard showed marginally significant differences in their reliabilities (z = -1.51, p = .07).

23

Split-Half Reliability

The criterion measures used in this study (FA/DA, Ospan, and Stroop) are commonly used in research because they are recognized as highly reliable measures.

Nevertheless, the reliability of each criterion measure was assessed in the current study using a split-half procedure (because each measure was only administered once) as well as the Spearman-Brown boosting formula, the results of which are in Table 6b. Our divided attention paradigm was found to have strong reliability, r = .92 for the divided attention task, and r = .93 for the full attention task. Ospan was found to have a reliability that was somewhat lower than what is traditionally reported in the literature, r = .68 in the absolute scoring condition, r = .71 in the relative scoring condition (e.g., Conway et al.

(2005) report it to generally be around .80). Stroop produced surprisingly poor reliability, only r = .65. This is likely due to some combination of the following: (1) Motivational changes over the course of the task: The Stroop task was the last critical criterion measure to be administered in the current study and thus cognitive fatigue and subject apathy were likely at their highest for this task; (2) the binary format of the current Stroop task, and most critically, (3) the Stroop effect is represented by a subtraction of congruent from incongruent performance; subtraction scores are notorious for producing lower reliability in the Stroop task compared with correlations using the response latencies from one or more conditions (Strauss, Allen, Jorgensen, & Cramer, 2005). In sum, with the exception of the divided attention task, these frequently used criterion measures achieved lower levels of reliability in the current study than expected.

24

Correlations Among Trails Tasks

The Pearson Product-Moment correlations between the Trails tasks can be found in Table 7. Correlations were conducted in two ways; the first set of correlations described below look at relationships between the ―B‖ forms only while the second round of correlations look at subtraction scores (B minus A). The most evident pattern in these correlations is that the majority of the Trails tasks, including the TMT, showed consistently significant correlations with one another. However, there is also one clear exception to this pattern of significance: Both the TMT and eTrails-FlashNext lost much of their predictability on the second administration. The potential weaknesses of these two Trails variants specifically, and the potential problems with multiple administrations of an executive task, will be discussed in more detail in the General Discussion. As a general rule, though, this table points to fairly good construct validity for eTrails; eTrails generally correlates with the TMT and with itself, which suggests that each variant is measuring the same basic ability.

Correlations of Trails to Criterion Measures

Few correlations between the B forms of the Trails variants and the executive tasks achieved significance (Table 8). The single exception to this pattern was the divided attention task which was correlated with the first administration of the TMT (r = .363), the first administration of the eTrails-Standard (r = .354), and the first administration of eTrails-FlashRandom (r = .354). Neither Ospan nor Stroop correlated with any of the

TMT or eTrails tests. The only other notable correlation involving the criterion measures was between Ospan Absolute and the divided attention task (r = -.370).

25

Correlations Using Subtraction Scores

Our main hypotheses focused on the use of using subtraction scores as the key measure for the Trails tasks given their use as the principal index of executive ability in the original Trail Making Task, Unfortunately, none of the correlations involving the

Trails subtraction scores achieved significance, including both correlations among the

Trails tasks themselves (Table 9) and with the criterion measures (Table 10). A few of the strongest correlations—largely involving the second administration of eTrails-Standard and the divided attention task—remained, but the general pattern is clearly one of nonsignificance. Also notable, but not easily interpretable, the Stroop task, which had not demonstrated a significant interaction in any of the other analyses, correlated significantly with the first administration of eTrails-RandomFlash.

DISCUSSION

The main goal of the current study was to standardize the procedures associated with the pen-and-paper TMT using computerized, touch-screen technology. By simply streamlining the administration of the TMT, it was expected that eTrails-Standard would produce fewer errors, show increased reliability, and produce stronger correlations with the other executive measures compared with the original TMT. Moreover, the eTrails variants designed to strategically tax specific executive abilities (inhibition and visual search) were expected to show even stronger correlations with other executive measures.

By achieving these goals, this study would provide both a clearer understanding of the factors that drive the predictive power of the Trail Making Task as well as create a potentially better diagnostic tool for executive deficits.

26

Computerizing the Trail Making Test

The first goal of the current study–improving the administration of the

TMT through computerization—appears to have been achieved to some degree. Both the

TMT and eTrails-Standard produced data consistent with the prior literature (Arbuthnott

& Frank, 2000): In each case, subjects took longer to complete the ―B‖ form of the task compared with the ―A‖ form and there was more variability in performance on the ―B‖ forms; both of these outcomes are consistent with the more executive nature of Trails B and point to sufficient individual differences in the current study to conduct correlational analyses.

More importantly, there were a number of limitations observed for the TMT in the current study that appear to have been mitigated by computer administration. The first is the higher than expected error rate on the TMT. Most of the errors observed were due to rushed and careless performance (not connecting the pen line to a circle). These errors due to carelessness were eliminated in eTrails due to its strict criteria as to what counts as a correct response. These restrictions help make the error term a more helpful tool and the overall RTs a more valuable indicator of performance.

A second, potentially related finding is the reduced practice effects for eTrails compared with the TMT. There was a 20% reduction in average reaction time for TMT from the first to second administration (for the ―B‖ forms) and only 10.33% for eTrails-

Standard (―B‖ forms). The reduced practice effects may be due, in part, to the fact that there is less of a learning curve in the computer version. Many of the confusing procedural aspects of the TMT are removed in eTrails (e.g. learning the mechanics involved in connecting the circles quickly, holding your arm in a position that avoids

27

blocking the target numbers/letters, understanding what to do when you produce an incorrect trail, etc.). Therefore, while participants may use the first administration of the

TMT to get accustomed to the task and thus show substantial improvement the second time they encounter the task, performance on eTrails would not show the same degree of change because participants likely start off performing the eTrails task at a more optimal level. A second factor likely contributing to the reduced practice effects on the eTrails variants is the fact that, while TMT involved repeated administration of the identical form, eTrails never repeated the same layout of targets. Thus, knowledge acquired in the first round of the TMT would have facilitated performance more directly on the second round than was the case for eTrails,

A final benefit of eTrails was evident in the correlations among the various eTrails variants and the TMT. The majority of the eTrails variants correlated with one another which is indicative of solid construct validity in these new, computerized Trails tasks. One of the exceptions to this pattern of significance is also noteworthy. The second administration of the TMT failed to demonstrate consistent correlations with the other

Trails tasks. Given the importance of repeated testing in diagnosing dementia and other cognitive disorders, it is of some concern that a widely used test changes in its predictability from its first to second administration. Computerization of the Trails task appears to have successfully increased its consistency.

It also bears mentioning at this point that one of the eTrails variants, eTrails-

FlashNext, also did not fare well on the second administration. eTrails is intended as a measure executive control and FlashNext is ostensibly the least executive of all the eTrails variants. It differs from the other eTrails tasks in that participants can arrive at the

28

correct answer by simply responding automatically to task cues (i.e., the flashing of the next item in the series). The automaticity of FlashNext is supported by its considerably faster reaction times compared to the other eTrails variants. These automatic processes are likely at their strongest during the second administration of FlashNext, as participants become increasingly reliant on these automatic signals. As performance on FlashNext becomes more automatic, it would be expected to correlate less with other eTrails tasks measuring executive abilities.

Task Reliability

The issue of task reliability in the current study was mixed. On one hand, an exciting outcome of this study was the noticeable improvement in reliability observed for the eTrails variants compared with the TMT. Every version of eTrails produced higher reliability coefficients (ranging from .55 to .62) than that produced by the TMT (.37).

Moreover, the difference in reliability between the TMT and eTrails-Standard almost reached statistical significance. These improvements were likely due, at least in part, to the reduction of administration errors discussed above; in other words, unlike the TMT, eTrails performance during the two different administrations was likely due to a common set of executive processes rather than idiosyncratic factors having to do with task administration.

Conversely, the reliability estimates for many of our established criterion measures were low. This was unexpected given that individual performance on each of our executive measures was consistent with expectations. Ospan is a generally reliable measure (achieving reliability of approximately .80; Conway et al., 2005); this level of reliability was not replicated in the current study (where a split-half reliabilities of .68

29

and .71 were obtained for absolute and relative scoring, respectively). There is no clear explanation for this decreased reliability. Anecdotal evidence suggests that participants found it to be a very frustrating task and that they might have ―given up‖ halfway through.

Like Ospan, Stroop performance was consistent with prior research (Spieler,

Balota, & Faust, 1996); mean reaction times for congruent trials were significantly faster than neutral trials (i.e., there was significant facilitation) and incongruent trials were significantly slower than the neutral trials (i.e., there was significant interference). This also produced ample variability in the resultant Stroop effects in this task. Nevertheless, the Stroop task produced the lowest reliability estimate of all of the criterion measures (r

= .65). As discussed earlier, this low reliability was likely due primarily to the use of difference scores, or Stroop effects, as the main predictor for this task, but might have also been due to the binary nature of the current task or to subject fatigue.

The one criterion task that showed both high levels of split-half reliability and significant correlations with more than one of the Trails measures was the divided attention task. This finding, while clearly not as strong as anticipated, is noteworthy given the secondary goal of the current research—to explore the use of dual-task costs as a proxy for cognitive decline. When the participants had their attention divided, their accuracy fell (to .36 from .73). This is a substantial drop and suggests that our divided attention task succeeded in reducing the participant's total amount of available cognitive resources. Thereby, making their data resemble an older adult's expected performance and giving our data set a proxy for a diverse cognitive functioning. The significant

30

correlations between eTrails and the divided attention task point to the potential for eTrails to be a sensitive and predictive measure across a range of cognitive ability.

CONCLUSIONS

Of the two main goals of this research—to produce a viable computerized Trails measure and to demonstrate its ability to predict a variety of other executive tasks—only the first was met with any degree of success. Even still, there are a number of take-home lessons from this research that help inform how to build a good Trails measure.

A Trails task must not be so complicated as to produce large numbers of procedural errors. Such errors may compromise reliability. Conversely, a Trails task cannot be so easy that it can be performed automatically. Executive functioning is a difficult construct to measure, because it is required only in novel situations that cannot be performed using routines or habits. Tasks, especially when they are repeatedly administered, run the risk of becoming quickly automated such that one often only has one or two attempts to get an accurate measure of executive control. Note that the two highest performing eTrails variants in the current study, Scramble and Flash Random, were the ones arguably least amenable to the build-up of automaticity. Moreover, given that our criterion tasks were performed so late in the experimental sequence, perhaps participants had exhausted their executive control and were running more on automatic, rather than executive, abilities. In addition to maximizing executive functioning by performing key tasks early, future Trails studies should also use multiple forms of each test. The current results demonstrated the key role of multiple forms in mitigating

31

practice effects and for demonstrating the kind of reliability necessary for predicting the differences in cognition seen in Alzheimer‘s Disease.

Alzheimer‘s Disease is a destructive and expensive burden on society. It is a disease that is emotionally painful for both the individual and their caretakers. Although this study does not provide any definitive conclusions regarding the ability of our computerized Trails measure to predict the kinds of executive declines associated with

Alzheimer‘s Disease, it does provide a starting point for creating a better Trails task. The

TMT has proven itself to be a successful tool; however, the small increases in accuracy and reliability may provide the first steps toward the small increases in the predictive power that could have far-reaching clinical and research benefits in the future.

32

REFERENCES

Arbuthnott, K., & Frank, J. (2000). Trail Making Test, Part B as a measure of executive

control: Validation using a set-switching paradigm. Journal of Clinical and

Experimental Neuropsychology, 22, 518-528.

Ashendorf, L., Jefferson, A. L., O'Connor, M. K., Chaisson, C., Green, R. C., & Stern, R.

A. (2008) Archives of Clinical Neuropsychology, 23(2), 129-137.

Baddeley, A. D., & Hitch, G. (1974). Working memory. In K.W. Spence and J. T. Spence

(eds.) The Psychology of Learning and Motivation, vol 8. (pp. 67-89). New York:

Academic Press.

Basso, M. R., Bornstein, R. A., & Lang, J. M. (1999). Practice effects on commonly used

measures of executive function across twelve months. The Clinical

Neuropsychologist, 13, 283-292.

Blacker, D., Lee, H., Muzikansky, A., Martin, E. C., Tanzi, R., & McArdle, J. J. et al.,

(2007). Neuropsychological measures in normal individuals that predict cognitive

decline. Archives of Neurology, 64, 862-871.

Bondy, M. W., Serody, A. B., Chan, A. S., Eberson-Shumate, S. C., Delis, D. C., Hansen,

L. A., & Salmon, D. P. (2002). Cognitive and neuropathologic correlates of stroop

color-word test performance in Alzheimer's disease. Neuropsychology,16(3), 335-

343.

Buck, K. K., Atkinson, T. M., & Ryan, J. P. (2008). Evidence of practice effects in

variants of the Trail Making Test during serial assessment. Journal of Clinical and

Experimental Neuropsychology, 30, 312-318.

33

Cahn, D. A., Salmon, D. P., Butters, N., Wiederholt, W.C., Corey-Bloom, J., Edelstein,

S.L., & Barrett-Connor, E. (1995). Detection of dementia of the Alzheimer type in a

population-based sample: Neuropsychological test performance. Journal of the

International Neuropsychological Society, 1, 252–260

Caroli, A., & Frisoni, GB. (2009). Quantitative evaluation of Alzheimer's disease. Expert

review of Medical Device, 6(5), 569-688.

Chen, P., Ratcliff, G., Belle, S. H., Cauley, J. A., DeKosky, S. T., & Ganguli, M. (2001).

Patterns of cognitive decline in presymtomatic Alzheimer's Disease. Archives of

General , 58, 853-858.

Conway, A. R. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle,

R. W. (2005). Working memory span tasks: A methodological review and user‘s

guide. Psychonomic Bulletin & Review, 12, 769-786.

Craik, F. I. M. (1982) . Selective changes in encoding as a function of reduced processing

capacity. Cognitive Research in Psychology, Berlin: Deutscher Verlag der

Wissenschaften, 152–161.

Daniels, K.A., Toth, J.P., & Jacoby, L.L. (2006). The aging of executive functions. In

F.I.M. Craik & E. Bialystok (Eds.), Lifespan cognition: Mechanisms of change (pp. in

press). New York, NY: Oxford University Press

Drapeau, C.E., Bastien-Toniazzo, M., Rous, C., & Carlier, M. (2007) Nonequivalence of

computerized and paper-and-pencil versions of the trail making test. Perceptual &

Motor Skills, 104(3), 785-791.

Engle, R. W., (2002) Working memory capacity as executive attention. Current

Directions in Psychological Science, 11(1), 19-23.

34

Jacoby, L.L., Bishara, A.J., Hessels, S., & Toth, J.P. (2005). Aging, subjective

experience, and cognitive control: Dramatic false remembering by older

adults. Journal of Experimental Psychology: General, 134, 131-148.

Jacoby, L.L., Toth, J.P., & Yonelinas, A.P. (1993). Separating conscious and unconscious

influences of memory: Measuring recollection. Journal of Experimental Psychology:

General, 122(2), 139-154.

Johnson, J. K., Lui, L., & Yaffe, K. (2007). Executive Function, More Than Global

Cognition, Predicts Functional Decline and Mortality in Elderly Women. The

Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 62,

1134-1141.

Kensinger E.A., Shearer D.K., Locascio J.J., Growdon J.H., & Corkin S. (2003).

Working memory in mild Alzheimer's disease and early Parkinson's disease.

Neuropsychology, 17(2), 230-239.

Kinner, E. I., & Fernandes, M.A. (2009). Illusory recollection in iolder adults and

younger adults under divided attention. Psychology and Aging, 24(1), 211-216

Kubo, M., Shoshi, C., Kitawaki, T., Takemoto, R., Kinugasa, K., Yoshida, H., Honda, C.,

& Okamoto, M. (2008). Increase in prefrontal cortex blood flow during the computer

version of the trail making test. Neuropscyhobiology, 58, 200-210.

Lezak, MD, Howieson, D.B., & Loring, D.W. (2004). Neuropsychological Assessment

(4th ed.). New York: Oxford University Press.

MacLeod, C. M. (1992). The stroop task: The "gold standard" of attention measures.

Journal of Experimental Psychology: General, 121(1), 12-14.

35

Mitrushina, M. N., Boone, K. B., & D‘Elia, L. F. (1999). Handbook of Normative Data

for Neuropsychological Assesment. New York: Oxford University Press.

Rosen, V. M., Bergeson, J. L., Putnam, K., Harwell, A., & Sunderland, T. (2002).

Working memory and apolipoprotein E: What's the connection? Neuropsychologia,

40(13), 2226-2233.

Salthouse, T.A., Toth, J.P., Daniels, K., Parks, C., Pak, R., Wolbrette, M., & Hocking,

K.J. (2000). Effects of aging on efficiency of task switching in a variant of the Trail

Making Test. Neuropsychology, 14, 102-111.

Sanchez-Cubillo, L., Perianez, J.A., Adrover-Roig, D., Rodriguez-Sanchez, J.M., Rios-

Lago, M., Tirapu, J., & Barcelo, F. (2009). Construct validity of the trail making test:

Role of task-switching, working memory, inhibition/interference control, and

visuomotor abilities. Journal of the International Neuropsychological Society, 15,

438-450.

Spieler, D.H., Balota, D.A. (2000). Factors influencing word naming in younger and

older adults. Psychology and Aging, 16(2), 312-322.

Spieler, D. H., Balota, D. A., & Faust, M. E. (1996). Stroop performance in healthy

younger and older adults and in individuals with dementia of the Alzheimer's

type. Journal of Experimental Psychology: Human Perception and Performance,

22, 461-479.

Storandt, M. (2008). Cognitive deficits in the early stages of Alzheimer's Disease.

Current Directions in Psychological Science, 17, 198-202

36

Strauss, E., Sherman, E.M.S., & Spreen, O. (2006). A Compendium of

neuropsychological tests: Administration, norms, and commentary. (3rd. ed.). NY.

Oxford University Press.

Turner, M. L. & Engle, R. W. (1989). Is working memory capacity task dependent?

Journal of Memory and Language, 28, 127-154.

Reitan, R. M., & Wolfson, D. (1985). The Halstead-Reitan Neuropsychological Test

Battery: Theory and Clinical Interpretation. Tucson: Neuropsychology press.

Verhaegen, P., Steitz, D., Sliwinski, M., & Cerella, J. (2003) Aging and Dual task

performance: A meta-analysis. Psychology and Aging, 18, 443-460.

37

Table 1

Trail Making Test Scores

TMTA 1st TMTA 2nd TMTB 1st TMTB 2nd Administration Administration Administration Administration Average 21518 18586 46789 37408

STDev 7035 4374 17046 11658

*scores in milliseconds

38

Table 2a

Form ―A‖ Statistics for eTrails eTrails eTrails eTrails eTrails eTrails eTrails eTrails eTrails Standard Standard FlashNext FlashNext Scramble Scramble FlashRandom FlashRandom A1 A2 A1 A2 A1 A2 A1 A2 Average 14124 13915 12412 11789 19508 18495 15839 14155

SD 3114 3335 2517 2429 4063 3883 3508 3076

*scores in milliseconds

Table 2b

Form ―B‖ Statistics for eTrails

eTrails eTrails eTrails eTrails eTrails eTrails eTrails eTrails Standard Standard FlashNext FlashNext Scramble Scramble FlashRandom FlashRandom B1 B2 B1 B2 B1 B2 B1 B2 Average 18257 16370 12834 12613 24669 23380 19450 16284

SD 5080 3872 2948 2660 5840 5304 5746 4441

*scores in milliseconds

39

Table 3

Divided Attention Task Results

DA ACC FA ACC FA-DA Corrected Corrected Average 0.37 0.73 0.37

SD 0.18 0.16 0.17

40

Table 4

OSPAN Results

Ospan Ospan Relative Absolute Average 26.12 13.44

SD 5.21 6.19

41

Table 5

Stroop results

Congruent Incongruent Neutral Facilitation Interference Stroop Effect Average 564 704 601 37 103 140

SD 84.17 130.02 99.00 47.21 73.90 72.57

*scores in milliseconds

42

Table 6a

Reliability for Trails Tasks

TMT eTrailsStandard eTrailsFlashNext eTrailsScramble eTrailsFlashRandom

Time 1 46,789 18,415 12,880 24,429 19,744

Time 2 37,408 16,644 12,815 23,380 16,301

R 0.37 0.62 0.56 0.55 0.58

*Time in milliseconds

Table 6b

Reliability for Criterion Measures

Divided Attention Full Attention Ospan Relative Ospan Absolute Stroop Split-Half Reliability 0.92 0.93 0.71 0.68 0.65

43

Table 7 Correlation Between Trails Tasks TMT TMT eTrails eTrails eTrails eTrails eTrails eTrails eTrails B1 B2 Standard Standard FlashNext FlashNext Scramble Scramble Flash B1 B2 B1 B2 B1 B2 Random B1 TMTB1 TMTB2 0.366 eTrails 0.668 0.315 Standard B1 eTrails 0.593 0.296 0.62 Standard B2 eTrails 0.547 0.494 0.488 0.655 FlashNext B1 eTrails 0.34 0.016 0.37 0.378 0.558 FlashNext B2 eTrails 0.425 0.168 0.484 0.542 0.486 0.431 Scramble B1 eTrails 0.537 0.5 0.385 0.626 0.588 0.224 0.547 Scramble B2 eTrails 0.568 0.505 0.461 0.469 0.381 0.1 0.297 0.537 FlashRandom B1 eTrails 0.416 0.595 0.436 0.479 0.527 0.089 0.375 0.719 0.575 FlashRandom B2 *green=significant

44

Table 8

Correlations Between Trails Tasks and Criterion Measures

eTrails eTrails eTrails eTrails eTrails eTrails eTrails eTrails Flash Flash TMT TMT Standard Standard FlashNext FlashNext Scramble Scramble Random Random B1 B2 B1 B2 B1 B2 B1 B2 B1 B2 Stroop -0.165 0.006 -0.081 0.021 0.181 0.176 0.31 0.14 -0.124 .061 Ospan -0.124 Relative 0.111 -0.012 -0.183 -0.146 -0.089 -0.39 -0.105 0.033 -0.063 Ospan -.224 Absolute 0.062 -0.033 -0.129 -0.027 0.047 0.047 -0.004 0.067 -0.056 FADA 0.362 0.303 0.354 0.187 0.196 0.1963 -0.091 0.166 0.354 .273

*green=significant

45

Table 9

Correlation Between Trails Tasks Using Subtraction Scores

eTrails eTrails eTrails eTrails eTrails eTrails eTrails TMT TMT Standard Standard FlashNext FlashNext Scramble Scramble FlashRandom 1 2 1 2 1 2 1 2 1 TMT 1 TMT 2 0.116 eTrails Standard1 0.161 -0.034 eTrails Standard2 0.218 0.083 0.241 eTrails FlashNext1 0.124 0.375 0.004 0.421 eTrails FlashNext2 0.317 0.074 0.045 -0.065 0.028 eTrails Scramble1 0.317 0.122 0.102 0.151 0.221 0.136 eTrails Scramble2 -0.164 0.262 0.055 0.059 0.283 0.049 0.2 eTrails FlashRandom 1 0.162 0.383 0.046 -0.7 0.126 0.334 -0.095 -0.016 eTrails FlashRandom 2 0.375 0.338 0.247 0.085 0.052 0.121 0.3 0.156 -0.036 *green=significant

46

Table 10

Correlation Between Trails Tasks and Criterion Measures Using Subtraction Scores

eTrails eTrails eTrails eTrails eTrails eTrails Flash Flash eTrails eTrails Flash Flash TMT TMT Standard Standard Next Next Scramble Scramble Random Random 1 2 1 2 1 2 1 2 1 2 Stroop -0.184 0.072 0.056 0.133 -0.084 0.057 0.109 0.313 -0.353 -0.057 Ospan Relative 0.089 0.075 -0.034 -0.37 0.194 0.00 0.087 0.076 0.106 0.049 Ospan Absolute 0.004 0.056 0.97 -0.336 0.241 0.019 0.036 0.22 0.06 -0.027 FADA 0.312 0.175 0.115 0.41 0.263 0.186 -0.176 -0.01 0.31 -0.053 *green=significant

47

Appendix A

Figure 1

Trails B Example

48

Appendix B

Figure 2

Example of Button Layout Procedure

1 2 1 2

3 4 3 4

1 2 1 2

3 4 3 4

49