Behavior Research Methods, Instruments, & Computers 2003, 35 (1), 129-135

Generating anagrams from multiple core strings employing user-defined and orthographic parameters

TIMOTHY R. JORDAN and AXEL MONTEIRO University of Nottingham, Nottingham, England

Anagrams are used widely in psychological research. However, generatinga range of strings with the same lettercontent is an inherently difficult and time-consuming task for humans, and current computer- based anagram generatorsdo not provide the controls necessaryfor psychological research.In this ar- ticle,we present a computational algorithm that overcomes these problems. Specifically,the algorithm processes automatically each in a user-defined source and outputs, for each word, all possible anagrams that exist as (or as nonwords, if required) as defined by the same source vo- cabulary. Moreover, we show how the output of the algorithm can be filtered to produce anagrams within specificuser-definedorthographic parameters.For example, the anagramsproduced can be filtered to produce words that share, with each other or with other words in the source vocabulary, letters in only certain positions. Finally, we provide free access to the complete Windows-based program and source code containing these facilitiesfor anagram generation.

Anagrams play an important and pervasive role in psy- ences of linguistic category (e.g., frequency, imageabil- chological research. For example, anagrams provide a ity, concreteness, orthographic structure, lexicality) on measure of problem-solving ability (i.e., where the task performance to be revealed more clearly. is to generate a word composed of the same letters as a The appropriate selection of core strings (e.g., slate) presented string) that has been used to study a range of and the generation of anagrams (in this case, including psychological issues, including insight (e.g., Smith & the words least, stale, steal, tales, teals,andtesla)foruse Kounios, 1996), aging (e.g., Witte & Freund, 1995), in any psychological research requires certain controls recognition memory (e.g., Weldon, 1991), semantic over the procedure of core string selection and anagram memory (e.g., White, 1988), and the topography of generation that will avoid confounds that may contami- evoked brain activity (e.g., Skrandies, Reik, & Kunze, nate the data produced by an experiment. First, control 1999). Anagrams have also been used extensively to over the source vocabulary used to validate (as legal study processes involved in word recognition, where a words) core strings and their letter-string permutations great deal of research involves comparing performances ensures that only linguistically appropriate core strings between stimuli from different linguistic categories, in- and anagrams are used. Most obviously, defining the cluding frequency, imageability, concreteness, ortho- languagewithin which core strings exist and from which graphic structure, and lexicality (words vs. nonwords).1 anagrams are generated ensures that all permutations are For example, when performance with words and non- relevant for a particular participant population.However, words was compared, several studies (e.g., Gibson, Pick, this is important not only for determining core strings Osser, & Hammond, 1962; Jordan, Patching, & Milner, and anagrams for languages that are highly individual 2000; Mason, 1975; Massaro & Klitzke, 1979; Massaro, (e.g., English vs. French), but also for core strings and Venezky, & Taylor, 1979; Reicher, 1969) have used stim- anagrams specific to variations within a language; for uli matched for their individual letter content (e.g., show example, despite their overall similarity,American, Aus- vs. ohws). The attraction of this matching is that differ- tralian, British, and CanadianEnglish contain words that ences in basic letter content can be removed as a con- vary in their presence, , and frequency of usage foundingvariable between categories and so allow influ- across these three vocabularies. In addition, control over the source vocabulary allows anagrams to be produced from either an exhaustive search of the entire vocabulary This work was supported by BBSRC Grant 42/S12111 to T.R.J. The of a language or a subset of the vocabulary (e.g., one in- order of authorship is alphabetical, and both authors contributed equally cluding only words above a certain frequency of usage). to this work. Correspondenceconcerning this article should be addressed Second, the selection of core strings and anagrams for to either T. R. Jordan or A. Monteiro, School of Psychology,University of Nottingham, University Park, Nottingham NG7 2RD, England an experiment is facilitated and its validity enhanced (e-mail: [email protected] or lpxaxm@psychology. when all the anagrams of all the words in the chosen vo- nottingham.ac.uk). cabulary are availableat the start of the selection process.

129 Copyright 2003 Psychonomic Society, Inc. 130 JORDAN AND MONTEIRO

Without this availability, the appropriateness of core lier (see Table 1)—more specifically, with respect to the strings and their anagrams for inclusion in an experiment following. is difficult to determine, and the entire selection process is 1. The vast majority of anagram generators currently susceptible to experimenter bias. For example, core words available provide no facility for determining the system’s that are subjectivelymore likely to be generated (e.g., of source vocabulary (used to verify the core letter-string higher frequencies of occurrence) are more likely to be and its permutationsas legal words), and most provide no selected. Indeed, knowing the number of real-word ana- indication of the source used. This is unsuitable for the grams that can be produced from a particular core string productionof experimentalstimuli,for which knowledge and how this compares with the number produced by other, and control of the nature of the source vocabulary is re- potentialcore strings is crucial for assessing the suitability quired to produce a stimulus set of known characteristics of a core string for a particular task. For example, when and maximal ecological validity. In particular, an inap- nonword anagrams are selected for use in problem-solving propriate source vocabulary may not provide anagrams experiments or as controls in word recognition experi- representative of the linguistic environment of partici- ments, nonwords with just one real-word anagram would pants. Indeed, problems arise not only when the source provide qualitativelydifferent stimuli, as compared with vocabulary has an unknown content, but also when the those for which two or more real-word anagrams exist. content is known but is less than ideal—for example, Third, control over the nature of the anagrams gener- when it contains too few words or contains that ated allows a more refined and focused use of anagrams are inappropriate for a particular participant population in psychological research. In particular, when anagrams (e.g., American English for British participants). of core strings are generated, user-defined constraints 2. All anagram generators currently available allow placed on the operation of the generating algorithm the production of anagrams from core letter strings only allow only the types of anagram relevant to the aims of by taking as input one core string at a time. This input an experiment to be specified and produced and so avoid string must then be processed before the next input string the productionof all (includingunwanted) combinations can be entered by the researcher. This piecemeal approach that satisfy the general principle of an anagram: For ex- is unsuited to the production of experimental stimuli, ample, anagrams above a certain frequency of written where sufficient numbers of appropriatelymatched stim- occurrence, anagrams for which letters in only certain uli are likely to be achieved only after permutations from positions in the core word are transposed, or anagrams several thousand input strings have been calculated. that share letters or groups of letters in only certain po- 3. No anagram generator currently available allows sitions with other words in the source vocabulary. any user-defined, research-relevant constraints to be However, generating a range of different strings with placed on the operation of the generating algorithm. the same letter content is an inherently difficult and Thus, when anagrams of core strings are derived, all time-consuming task (at least for humans), particularly combinations that satisfy the general principle of an ana- when each permutation must be verified as a word or a gram are generated without allowing constraints on such nonword by checking for its existence in the appropriate things as the frequency of occurrence or the ortho- vocabulary. Indeed, whereas the generation and verifi- graphic structure of the strings produced. cation of all possible permutations is feasible for a sin- In this article, we present a computationalalgorithm for gle core string of three or four letters, the generation and producinganagrams from any suitableuser-defined source verification of all possible permutations for multiple vocabulary.Specifically, the algorithm takes in each word core strings, especially those of five letters or more, de- in a chosen vocabularyand outputs,for each word, all pos- mands computational involvementto achieve acceptable sible anagrams that exist in the same vocabulary(and, if re- levels of accuracy, efficiency, and validity. quired, all that do not—i.e., nonwords). Moreover, the out- Several computer-basedanagram generators are avail- put of the algorithm can be filtered to produce anagrams able, and each takes in a single word, phrase, or sentence within specific user-defined orthographic parameters. For and transposes the letters to produce a new legal English example, the anagrams produced can be filtered to pro- word, phrase, or sentence. Many of these generators are duce words that share, with each other or with other words implementedover the Internet and can be freely accessed in the source vocabulary, letters in only certain positions. using standard Web browsers, such as Internet Explorer We present a design for a computer program that can and Netscape Navigator(see Appendix A). Several others take in large numbers of words (e.g., from any suitable are available as freeware or shareware programs and can source vocabulary) in a single submission (i.e., without be downloaded from the Internet and used either for free individual word input) and can produce anagrams for or on payment of a nominal fee. Commercial anagram- each word within user-defined constraints. By providing generating programs are also available and generally this information, our intention is to allow the algorithm offer more sophisticatedanagram-generating algorithms, to be implemented within the bespoke programs of re- but at a greater cost (see Appendix B). searchers to produce strings specific to the demands of However, irrespective of their source, none of these individual experiments. To date, we have implemented anagram generators are well suited to the production of the algorithm in C/C++ and verified its use with a number experimentalstimuli for psychologicalresearch, because of substantial vocabularies readily and freely available they fail on at least two of the three controls outlinedear- in electronic format, including the MRC Psycholinguis- ANAGRAMS IN PSYCHOLOGICAL RESEARCH 131

Table 1 Limitations of Current Computer-Based Anagram Generators Multiple Core User-Defined Generator Control Input Strings Output Constraints On-Line Anagram Generators The WWW Anagram Generator no no no Brendan’s On-Line Anagram Generator no no no Internet Anagram Server no no no Andy’sAnagram Solver no no no The Anagram Engine no no no Arrak Anagrams no no no Inge’s Anagram Generator no no no Anagram Dictionary no no no Martin’s Anagram Generator no no no Hiten Sonpal’s Anagrams no no no Jumbles Unjumbled no no no Freeware and Shareware Anagram Generators Winagram 1.0 yes no no Logos (http://homepage.ntlworld.com/jeremy.riley/Logos) no no no Anagram Generator 1.19 yes no no The Electronic Alveary 1.6 no no no Anagrams 2.0 yes no no ABC Genius (http://www.bebesoft.com) yes no no TeaBag 2.0 (http://www.tiac.net/users/hlynka/anagrams) yes no no PuzzLex 5.11 (http://www.puzzlex.co.uk) no no no Commercial Anagram Generators Anagram Genius 8.0 yes no no

tic Database (http://www.psy.uwa.edu.au/MRCData- vocabulary. The program begins by reading a target vo- Base/mrc2.html), the Carnegie Mellon PronouncingDic- cabulary into an appropriate data structure held in mem- tionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict), ory. Once this has been achieved, all core strings in the the Moby (http://www.dcs.shef.ac.uk/research/ vocabulary are processed one by one. For each core ilash/Moby), and the Natural Language wordlists at DEC string, all possible permutations of the letters of the core (ftp://gatekeeper.dec.com/pub/misc/stolfi-wordlists/ string are determined, and then each of these permuta- english.tar.Z). In addition, the algorithm can also be tions is validated as a legal word by checking against the readily applied to the file/usr/dict/words, supplied as source vocabulary.An implementation of the above ana- part of most Unix systems. gram algorithm is relatively straightforward to produce. However, the overall efficiency (and utility) of the com- The Anagram Algorithm plete implemented program is highly dependant on the The general design of a program to determine ana- specific implementation of the algorithm component grams for a specific vocabulary is as follows: that determines the permutations for each core string. Determining all of the permutationsfor each core string Start: Read the vocabulary into a data structure in can be achieved in several ways. However, since vocabu- memory. laries are likely to contain words of different lengths, it is Iterate through the core strings in the vocabu- desirable that the implementation should be capable of lary, for each core string: producing permutations for all core strings in the vocabu- Output the core string. lary, regardless of differences in their lengths. If this is not Calculate all permutations for letters of the the case, the vocabulary must be divided into subvocabu- current word. laries according to core string length. Each subvocabulary Check each permutation against the input must then be processed individually,and the implemen- vocabulary. Discard any permutations that tation of the anagram program must be adjusted for each do not appear in the input vocabulary. subvocabulary(i.e., for each core string length required). Output the permutations for this word. Producing the permutations so that the same proce- Exit: Finished processing all the core strings in the dure is implementedregardless of word lengthis the most vocabulary. Apply any further filtering as neces- algorithmically complex component of an anagram- sary (e.g., discard permutations that do not con- generatingprogram. We will therefore present this lower tain particular letters in certain serial positions). level permutations algorithm in more detail. (An imple- The above design outlines the major components of a mentation of the permutations algorithm as a C/C++ program for determining all anagrams for a given source function is provided in Appendix C.) 132 JORDAN AND MONTEIRO

The Permutations Algorithm first letter position is the initial core string itself. So, for The total number of permutationsfor a core string com- a four-letter core string ABCD, when K = 1, the list of posed of N letters is given by N! (i.e., N-factorial). There- permutations is equal to fore, if a core string is four letters long, there is a total of ABCD 24 permutationsof the letters of that core string (e.g., 4! = 4 3 3 3 2 3 1 = 24). There are several ways to compute When K = 2, the letter in Position 2 (i.e., B), is swapped all 24 of these permutations. However, as was mentioned with the letter in Position 1 in all permutations based on above, it is advantageousfor a single procedure to be em- Letter Position 1. This gives the following set of permu- ployed that is capable of determining all of the permuta- tations: tions for any core string, regardless of its length. ABCD Therefore, the algorithm described here determines per- mutations in a recursive manner. It begins by finding all BACD permutations based on the first letter in the first letter po- When K = 3, the third letter, C, is swapped with all let- sition (with the remaining letters and letter positions un- ters in Position 2, and all letters in Position 1 in every changed). The only possible permutation for this is the permutationdetermined so far (i.e., both of those above), original sequence of letters from the input core string. This giving permutationis then used with the second letter and the sec- ond letter position to determine all permutations of the ABCD core string in which only the first two letters have been BACD transposed around the first two letter positions(i.e., the rest CBAD of the letters remain in their original positions).These per- mutations are then used with the third letter and the third ACBD letter positionto find all permutationsof the core string in- CABD volving transposition of only the first three letters around BCAD the first three letter positions. This set of permutations is then used with the fourth letter and the fourth letter posi- Finally (for four-letter words), K = 4, letter D must be tion to calculateall permutationsbased on the first four let- swapped in Letter Positions 1, 2, and 3 in the above, giving ters and the first four letter positionsof the core string. This ABCD DBCA DBAC DABC process is repeated (i.e., the Nth letter of the word and the Nth letter positionare used with the previouslydetermined BACD ADCB CDAB CDBA set of permutations), until all the letters of the core string CBAD ABDC CBDA CADB have been processed and the complete set of permutations ACBD DACB DCBA DCAB for the core string has been produced. CABD BDCA ADBC BDAC The procedure performed by the permutations algo- rithm is essentially the following: BCAD BADC ACDB BCDB Start: Set K to 0. Hence, the complete set of permutations for the core Loop1: Add 1 to K. string ABCD is derived. Further components of the ana- Set n to 0. gram algorithmcan now check these permutationsagainst Loop2: Add 1 to n. the source vocabulary (i.e., discarding permutations that Take the letter in position K and make do not appear in the source vocabulary if valid words are a new permutation by swapping it with required, or discarding permutations that do appear in the letter in position K 2 n for every the source vocabulary if nonwords are required) and can possible permutation based on K 2 1 perform any further filtering according to the specific letters. requirementsof experiments. For example,a filter can be If n is less than K then go back to applied to the outputof the above algorithmthat compares Loop2. the letter positionsof each permutation with those of the If K = the number of letters in the initial core string. Permutations can then be accepted or string then go to Exit. rejected on the basis of transposed letter content. In this If n = K then go back to Loop1. way, large quantities of letter-string stimuli can be pro- Exit: Finished calculating all the permutations for duced easily from any appropriate source vocabulary to this string. satisfy the requirements of virtually any psychological experiment in which anagram stimuli are required. For a four-letter core string this works as follows: An implementation of the permutations algorithm Starting with K = 1 (i.e., the first letter of the core above in C/C++ is provided in Appendix C. We have de- string), there are no swaps to make since there can be no veloped a complete anagram-generating program based permutations based on K 2 1 letters (when K =1,K2 1 on the previously described anagram algorithm that also is letter position zero; hence, no permutations). The only incorporates the permutationsalgorithm above.The com- possible permutation involving the first letter with the plete Windows-based program, the source code, and as- ANAGRAMS IN PSYCHOLOGICAL RESEARCH 133 sistance with development and implementation of filters ingfulness of stimulus material. Journal of Experimental Psychology, are availablefree of charge from the authors. Please e-mail 81, 275-280. Skrandies, W., Reik, P., & Kunze, C. [email protected] further details. (1999). Topography of evoked brain activity during mental arithmetic and language tasks: Sex dif- ferences. Neuropsychologia, 37, 421-430. REFERENCES Smith, R. W., & Kounios, J. (1996). Sudden insight: All-or-none pro- cessing revealed by speed–accuracy decomposition. Journal of Exper- Gibson, E.J., Pick, A.,Osser, H., & Hammond,M. (1962). The role of imental Psychology: Learning, Memory, & Cognition, 26, 1443-1462. grapheme– correspondence in the perception of words. Weldon, M. S. (1991). Mechanisms underlying priming on perceptual American Journal of Psychology, 75, 554-570. tasks. Journal of Experimental Psychology: Learning, Memory, & Jordan, T. R., Patching, G. R., & Milner, A. D. (2000). Lateralized Cognition, 17, 526-541. word recognition: Assessing the role of hemispheric specialization White, H. (1988). Semantic priming of anagram solutions. American and perceptual asymmetry. Journal of Experimental Psychology: Journal of Psychology, 101, 383-399. Human Perception & Performance, 26, 1192-1208. Witte, K. L., & Freund, J. S. (1995). Anagram solution as related to Mason, M. (1975). Reading ability and letter search time: Effects of or- adult age, anagram difficulty, and experience in solving crossword thographic structure defined by single-letter positional frequency. puzzles. Aging & Cognition, 2, 146-155. Journal of Experimental Psychology: General, 104, 146-166. Massaro, D. W., & Klitzke, D. (1979). The role of lateral masking and orthographic structure in letter and word recognition. Acta Psy- NOTE chologica, 43, 413-426. Massaro, D. W.,Venezky,R.L., & Taylor,G. A. (1979). Orthographic 1. Grammatically, the term anagram refers to words derived from regularity, positional frequency, and visual processing of letter strings. other words, but not to nonwords derived from any type of string. How- Journal of Experimental Psychology: General, 108, 107-124. ever, for simplicity, we do not make this distinction when using the term Reicher, G. M. (1969). Perceptual recognition as a function of mean- anagram in this article.

APPENDIX A

Web Browsers Internet Explorer 6.0 is available free of charge from Microsoft Corporation at http://www.microsoft.com. Netscape Navigator 6.0 is availablefree of charge from Netscape Communicationsat http://www.netscape.com.

On-Line Anagram Generators The WWW AnagramGenerator,by Eli Burke,is availablefor use free of chargeat http://www.failte.demon. co.uk/anagrams.htm. Brendan’s On-Line Anagram Generator, by Brendan Connell, is available for use free of charge at http://www.mbhs.edu/~bconnell/anagrams.html. InternetAnagram Server, by Anu Garg, is availablefor use free of chargeat http://www.wordsmith.org/ana- gram. Andy’sAnagram Solver, by Andrew M. Gay, is availablefor use free of chargeat http:// www.ssynth.co.uk/ ~gay/anagram.html. The AnagramEngine,by EasyPeasyLtd., is availablefor use free of chargeat http://www.easypeasy.com/ana- grams/index.html. Arrak Anagrams, by Arrak Software, is availablefor use free of charge at http://ag.arrak.fi/index_en.html. Inge’sAnagram Generator,by Inge KristianEliassen,is availablefor use free of chargeat http://www.mi.uib.no/ ~ingeke/anagram. Anagramdictionary,by Luke Metcalfe,is availablefor use free of chargeat http://www.orchy.com/dictionary/ anagrams.htm. Martin’sAnagram Generator,by Martin Mamo, is availablefor use free of chargeat http://freespace.virgin.net/ martin.mamo/anagram.html. Hiten Sonpal’sAnagrams,by Hiten Sopal, is availablefor use free of chargeat http://spruce.evansville.edu/ ~hs4/anagrams. Jumbles Unjumbled, by Brian R. Owen, is available for use free of charge at http://www.eecg. toronto.edu/~bryn/HTML/Jumbles.html.

(Continued on next page) 134 JORDAN AND MONTEIRO

APPENDIX B

Freeware and Shareware Anagram Programs Winagram 1.0, by Jenies Technologies Incorporated, is available free and can be downloaded from http://eyenettools.com/Anagram.htm. Logos, by Jeremy Riley, is availablefree and can be downloadedfrom http://homepage.ntlworld.com/jeremy. riley/Logos. Anagram Generator 1.19, by Jason Rampe and Jack Rampe, is available free and can be downloaded from http://fractalchaos.freeyellow.com/anagram.htm. The ElectronicAlveary 1.6, by Ross Beresford,is availablefor £25 (single-userlicense)from Bryson Lim- ited, 10 Wagtail Close, Twyford, Reading RG10 9ED, UK. An evaluation version can be downloaded from http://www.bryson.demon.co.uk. Anagrams 2.0, by Andrew Trevorrow, is available for $15 (single-user license), and can be downloaded from http://www.trevorrow.com/anagrams/index.html. ABC genius, by Be-Best Software, is available for $20. A free evaluation version can be downloaded from http://www.bebesoft.com. TeaBag 2.0, by Adrian Hlynka, is available free and can be downloaded from http://www.tiac.net/ users/hlynka/anagrams. PuzzLex 5.11, by N. Bentley, is available for £15. A free evaluation version can be downloaded from http://www.puzzlex.co.uk.

Commercial Anagram Programs Anagram Genius 8.0, by Genius 2000 Software, is available for £24.99 from Genius 2000 Software at http://www.genius2000.com/ag.html.

APPENDIX C The implementation of the permutations algorithm as a C/C++ function is presented below. The function takes as arguments: the number of letters in the word, a pointer to an area of memory (referenced as a two- dimensional(2-D) array of char) big enoughto hold all of the permutationsand an int that specifies how many permutationsthere will be in total (i.e., how many rows in the 2-D array,calculatedby N!, where N is the num- ber of letters in the input word). The word to be processed is supplied in row zero of the permutations array. New anagrams are calculated by copying appropriate (previously calculated) anagrams to the next empty row in the array and then swap- ping the appropriateletters. // Function: do_permutations // // Calculates all of the permutationsfor a word. The word is supplied as // the first element in the Permutations array. // // args: Number of letters in the word, Permutations array (pointer to 2D array of char), // Number of permutations // void do_permutations(int nLetters, char **pArray, int nPerm_lines) { int nCR; // Set up some int nCurrRow = 1; // local variables.

for(int m=2; m<(nLetters+1); m++) // Get all permutations { // for first 2 letters, then nCR = nCurrRow; // 3 letters then // 4 letters etc. for(int i = 0; i < nCR; i++) { strcpy(pArray[nCurrRow],pArray[i]); // Copy the current letter // string to the next row. ANAGRAMS IN PSYCHOLOGICAL RESEARCH 135

APPENDIX C for (int nSwap = 1; nSwap < m; nSwap++) // Swap the letters in { // the new row. char chTmp1 = pArray[nCurrRow][(m-nSwap)]; char chTmp2 = pArray[nCurrRow][((m-nSwap)-1)]; pArray[nCurrRow][(m-nSwap)] = chTmp2; pArray[nCurrRow][((m-nSwap)-1)]=chTmp1; nCurrRow += 1; // Increment the row // counter. if (nCurrRow < nPerm_lines) { // If this isn’t the last strcpy(pArray[nCurrRow],pArray[(nCurrRow-1)]); } // permutation, copy the // letter string to the // next row. } } } }

(Manuscript received July 25, 2001; revision accepted for publicationJuly 8, 2002.)