1 SUPPLEMENTAL MATERIALS
Problems of reliability and validity with similarity derived from category fluency This material supplements but does not replace the content of the peer-reviewed paper published in Psychiatry Research
Our category fluency data can be compared with earlier data by Francis S. Bellezza (1984) to verify whether they are representative of other participant samples than the one in the main article. Bellezza too had participants perform a category fluency task twice with an interval of one week. The participants were undergraduate students enrolled in introductory psychology courses at Ohio University (instead of speech and language therapy students at a Belgian university) and they were given three minutes to write down as many category exemplars as they could think of (instead of one minute to verbalize exemplars). Despite these methodological differences, both studies yield similar results and conclusions. Bellezza reports five indices that describe the category fluency data: (i) within-participants overlap, measured as the common element correlation between sessions, (ii) the probability of generating an exemplar during session 2 given that it was one of the first five exemplars
1 2 generated in session 1, (iii) the Spearman rank correlation between exemplars common to both sessions, (iv) the probability of generating adjacent exemplars across sessions, and (v) between-participants overlap, measured as the common element correlation between participants that were randomly paired within sessions. When we compute the averages of these indices for our data (see Table below), we find that there are no systematic differences with the ones Bellezza reported over 30 years ago, indicating that we have at our disposal representative category fluency data that are not unique to our particular sample. The indices that Bellezza (1984) proposes are measures of reliability. They indicate the extent to which exemplar generation occurs in the same manner across participants and sessions. The values obtained for the indices, both in Bellezza’s study and our own, indicate that the consistency within participants across sessions and between participants within sessions is only moderate. These results for non-semantic indices help explain the results we found for semantic structure in the main article. When there is little consistency in the exemplars participants generate and in the order in which they generate exemplars, relying on this information to determine similarities between exemplars will result in a poor estimate of the true similarity of those exemplars. The finding that “retrieval of exemplars from semantic memory occurs with only a modest amount of reliability” (Bellezza, 1984, p. 324) can be taken as an additional contra-indication for the derivation of similarities from category fluency data.
2 3 SUPPLEMENTAL MATERIALS Supplemental Table Comparison of own category fluency data with that of Bellezza (1984) Correlation Probability of Between Session 1 Generating Between- Within-Participants P(Session 2 | Order and Session Adjacent Participants Overlap Session 1)* 2 Order Exemplars Overlap own Bellezza own Bellezza own Bellezza own Bellezza own Bellezza ANIMALS 0.52 0.68 0.68 0.84 0.31 0.52 0.08 0.14 0.31 0.50 FRUIT 0.67 0.79 0.83 0.92 0.50 0.59 0.09 0.15 0.50 0.60 FURNITURE 0.58 0.66 0.82 0.82 0.58 0.72 0.09 0.08 0.46 0.37 VEHICLES 0.64 0.65 0.84 0.77 0.53 0.38 0.14 0.12 0.42 0.32 * This value represents the estimated probability of generating an exemplar in session 2 given that it was one of the first five exemplars generated in session 1.
3 4 SUPPLEMENTAL MATERIALS References Bellezza, F. S., 1984. Reliability of retrieval from semantic memory: Common categories. Bulletin of the Psychonomic Society 22, 324-326.
4
