Generating Anagrams from Multiple Core Strings Employing User-Defined Vocabularies and Orthographic Parameters
Total Page:16
File Type:pdf, Size:1020Kb
Behavior Research Methods, Instruments, & Computers 2003, 35 (1), 129-135 Generating anagrams from multiple core strings employing user-defined vocabularies and orthographic parameters TIMOTHY R. JORDAN and AXEL MONTEIRO University of Nottingham, Nottingham, England Anagrams are used widely in psychological research. However, generatinga range of strings with the same lettercontent is an inherently difficult and time-consuming task for humans, and current computer- based anagram generatorsdo not provide the controls necessaryfor psychological research.In this ar- ticle,we present a computational algorithm that overcomes these problems. Specifically,the algorithm processes automatically each word in a user-defined source vocabulary and outputs, for each word, all possible anagrams that exist as words (or as nonwords, if required) as defined by the same source vo- cabulary. Moreover, we show how the output of the algorithm can be filtered to produce anagrams within specificuser-definedorthographic parameters.For example, the anagramsproduced can be filtered to produce words that share, with each other or with other words in the source vocabulary, letters in only certain positions. Finally, we provide free access to the complete Windows-based program and source code containing these facilitiesfor anagram generation. Anagrams play an important and pervasive role in psy- ences of linguistic category (e.g., frequency, imageabil- chological research. For example, anagrams provide a ity, concreteness, orthographic structure, lexicality) on measure of problem-solving ability (i.e., where the task performance to be revealed more clearly. is to generate a word composed of the same letters as a The appropriate selection of core strings (e.g., slate) presented string) that has been used to study a range of and the generation of anagrams (in this case, including psychological issues, including insight (e.g., Smith & the words least, stale, steal, tales, teals,andtesla)foruse Kounios, 1996), aging (e.g., Witte & Freund, 1995), in any psychological research requires certain controls recognition memory (e.g., Weldon, 1991), semantic over the procedure of core string selection and anagram memory (e.g., White, 1988), and the topography of generation that will avoid confounds that may contami- evoked brain activity (e.g., Skrandies, Reik, & Kunze, nate the data produced by an experiment. First, control 1999). Anagrams have also been used extensively to over the source vocabulary used to validate (as legal study processes involved in word recognition, where a words) core strings and their letter-string permutations great deal of research involves comparing performances ensures that only linguistically appropriate core strings between stimuli from different linguistic categories, in- and anagrams are used. Most obviously, defining the cluding frequency, imageability, concreteness, ortho- languagewithin which core strings exist and from which graphic structure, and lexicality (words vs. nonwords).1 anagrams are generated ensures that all permutations are For example, when performance with words and non- relevant for a particular participant population.However, words was compared, several studies (e.g., Gibson, Pick, this is important not only for determining core strings Osser, & Hammond, 1962; Jordan, Patching, & Milner, and anagrams for languages that are highly individual 2000; Mason, 1975; Massaro & Klitzke, 1979; Massaro, (e.g., English vs. French), but also for core strings and Venezky, & Taylor, 1979; Reicher, 1969) have used stim- anagrams specific to variations within a language; for uli matched for their individual letter content (e.g., show example, despite their overall similarity,American, Aus- vs. ohws). The attraction of this matching is that differ- tralian, British, and CanadianEnglish contain words that ences in basic letter content can be removed as a con- vary in their presence, spelling, and frequency of usage foundingvariable between categories and so allow influ- across these three vocabularies. In addition, control over the source vocabulary allows anagrams to be produced from either an exhaustive search of the entire vocabulary This work was supported by BBSRC Grant 42/S12111 to T.R.J. The of a language or a subset of the vocabulary (e.g., one in- order of authorship is alphabetical, and both authors contributed equally cluding only words above a certain frequency of usage). to this work. Correspondenceconcerning this article should be addressed Second, the selection of core strings and anagrams for to either T. R. Jordan or A. Monteiro, School of Psychology,University of Nottingham, University Park, Nottingham NG7 2RD, England an experiment is facilitated and its validity enhanced (e-mail: trj@psychology.nottingham.ac.uk or lpxaxm@psychology. when all the anagrams of all the words in the chosen vo- nottingham.ac.uk). cabulary are availableat the start of the selection process. 129 Copyright 2003 Psychonomic Society, Inc. 130 JORDAN AND MONTEIRO Without this availability, the appropriateness of core lier (see Table 1)—more specifically, with respect to the strings and their anagrams for inclusion in an experiment following. is difficult to determine, and the entire selection process is 1. The vast majority of anagram generators currently susceptible to experimenter bias. For example, core words available provide no facility for determining the system’s that are subjectivelymore likely to be generated (e.g., of source vocabulary (used to verify the core letter-string higher frequencies of occurrence) are more likely to be and its permutationsas legal words), and most provide no selected. Indeed, knowing the number of real-word ana- indication of the source used. This is unsuitable for the grams that can be produced from a particular core string productionof experimentalstimuli,for which knowledge and how this compares with the number produced by other, and control of the nature of the source vocabulary is re- potentialcore strings is crucial for assessing the suitability quired to produce a stimulus set of known characteristics of a core string for a particular task. For example, when and maximal ecological validity. In particular, an inap- nonword anagrams are selected for use in problem-solving propriate source vocabulary may not provide anagrams experiments or as controls in word recognition experi- representative of the linguistic environment of partici- ments, nonwords with just one real-word anagram would pants. Indeed, problems arise not only when the source provide qualitativelydifferent stimuli, as compared with vocabulary has an unknown content, but also when the those for which two or more real-word anagrams exist. content is known but is less than ideal—for example, Third, control over the nature of the anagrams gener- when it contains too few words or contains spellings that ated allows a more refined and focused use of anagrams are inappropriate for a particular participant population in psychological research. In particular, when anagrams (e.g., American English for British participants). of core strings are generated, user-defined constraints 2. All anagram generators currently available allow placed on the operation of the generating algorithm the production of anagrams from core letter strings only allow only the types of anagram relevant to the aims of by taking as input one core string at a time. This input an experiment to be specified and produced and so avoid string must then be processed before the next input string the productionof all (includingunwanted) combinations can be entered by the researcher. This piecemeal approach that satisfy the general principle of an anagram: For ex- is unsuited to the production of experimental stimuli, ample, anagrams above a certain frequency of written where sufficient numbers of appropriatelymatched stim- occurrence, anagrams for which letters in only certain uli are likely to be achieved only after permutations from positions in the core word are transposed, or anagrams several thousand input strings have been calculated. that share letters or groups of letters in only certain po- 3. No anagram generator currently available allows sitions with other words in the source vocabulary. any user-defined, research-relevant constraints to be However, generating a range of different strings with placed on the operation of the generating algorithm. the same letter content is an inherently difficult and Thus, when anagrams of core strings are derived, all time-consuming task (at least for humans), particularly combinations that satisfy the general principle of an ana- when each permutation must be verified as a word or a gram are generated without allowing constraints on such nonword by checking for its existence in the appropriate things as the frequency of occurrence or the ortho- vocabulary. Indeed, whereas the generation and verifi- graphic structure of the strings produced. cation of all possible permutations is feasible for a sin- In this article, we present a computationalalgorithm for gle core string of three or four letters, the generation and producinganagrams from any suitableuser-defined source verification of all possible permutations for multiple vocabulary.Specifically, the algorithm takes in each word core strings, especially those of five letters or more, de- in a chosen vocabularyand outputs,for each word, all pos- mands computational involvementto achieve acceptable sible anagrams that exist in the same vocabulary(and, if re- levels of accuracy, efficiency,