How many words? It is often said that the is particularly rich in vocabulary) but to make such a statement we need to know what words to count and what counts as a word.

DAVID CRYSTAL meaning as the lock on a canal? English? And how many of Should ring (the shape) be kept HOwthesemanywordswordsdoesare atherenativein separate from ring (the sound?) Are speaker know? These apparently would you treat them as combina• such cases 'the same word with simple little questions turn out to be tions of old words: foster + brother, different meanings' or 'different surprisingly complicated. In answer care, and so on. This is a big problem words'? These are the daily decisions to the first, estimates have been given for the dictionary-makers, who often that any word-counter (or dictionary ranging from half a million to over 2 reach different conclusions about compiler) must make. million. In answer to the second, the what should be done. estimates have been as low as 10,000 What would you do with get at, get and over ten times that number. by, get in, get off, get over, and the Whose English are we People are, it seems, quite happy to dozens of other cases where get is used counting? drop all kinds of figures into their with an additional word. Would you lectures and publications (see Panel count get once, for all of these, or Sooner or later, the question would 1). The figures give the impression of would you say that, because these arise about the kind of vocabulary to great precision - though it should be items have different meanings (get at, include in your count. There noted that they are usually accompa• for example, can mean 'nag'), they wouldn't be a difficulty if the words nied by such emptying expressions as should be counted separately? In were part of standard English - used 'approximately', 'on average', or 'it is which case, what about get it?, getyour by educated people throughout the thought'. Nonetheless, the vagueness own back, get your act together, and all English-speaking world. Obviously does not stop organizations offering the other 'idioms'? Would you say these have to be counted. But what courses and exercises (at a price) that that these had to be counted about the vast numbers of words will enable readers to 'increase their separately too? Would you count kick which are not found everywhere • word power' - without ever providing the bucket (meaning 'die') as three words which are restricted to a these readers with the opportunity of familiar words or as a single idiom? It particular country (such as Canada, discovering what their current word hardly seems sensible to count the Britain, India, or Australia), or to a power actually is. words separately, for kick has nothing particular part of a country (such as How can we throw light on this to do with moving the foot, nor is Wales, Yorkshire or )? apparently confusing area? Let us bucket a container. They will include words like stroller begin with the question of how many If you let the meaning influence (= push-chair) and station (= stock words there are in English - a topic you (as it should), then you will find farm) from Australia, bach (= holiday which has attracted almost as many your word count growing very cottage) and pakeha (= white person) estimates as estimators. The question rapidly indeed. But as soon as you do from New Zealand, do'ICP (= village) is complex for two reasons. It partly this, you will start to worry about and indaba (= conference) from depends on what you count as an other meanings, even in single words. South Africa cwm (= valley) and English word, and partly on where Is there a single meaning for high in eisteddfod (= competitive arts festival) you go looking for them. high tea, high priest and high season? Is from Wales,faucet (= tap) and fall (= the lock on a door the same basic autumn) from North America, fort- What counts as a word?

Consider the problems, if someone asked you to count the number of Varying estlmates words in English. You would im• Shakespeare had one of the largest At two years old the average vocabul• mediately find thousands of cases vocabularies of any English writer, ary is about three hundred words. By where you would not be sure whether some 30,000 words. (Estimates of an the age of five it is about five thousand. to count one word or two. In writing, educated person's vocabulary today By twelve it is about 12,000. And there it is often not clear whether some• vary, but it is probably about half this, for most people it rests - at the same 15,000.) (Robert McCrum, et ai, The thing should be written as a single size repertoire employed by a popular Story of English, 1986, p. 102) daily newspaper. (Jane Bouttell, The word, as two words, or hyphenated. Guardian, 12 August 1986) Is it washing machine or washing• He [Shakespeare] has the largest machine? school children or school• vocabulary of any writer in English, Graduates have an average vocabulary children? flower pot, flower-pot or approximately 34,000 words, which is of about 23,000 words, fostered, I about double what an educated person flowerpot? Would you count all the would contend, by intensive tutoring. uses today in their lifetime. (John (Jane Bouttel, also ) items beginning with foster as new Barton, in The Story of English episode words: foster brother, foster care, foster 3) child,foster father,foster home, etc? Or

ENGLISH TODAY No. 12 - OCTOBER 1987 11 night (= two weeks) and nappy (= Royal Automobile Club), AAA ( = DAVID CRYSTAL read English at Automobile Association of America), baby wear) from Britain, loch( = lake) University College London, and has since and wee (= small) from Scotland, held posts in at the University or reflect local organisations and dunny (= money) and duppy (= ghost) College of , Bangor, and at attitudes - with varying levels of from Jamaica, lakh (= a hundred the , where he seriousness - such as MADD (= thousand) and crore (= ten million) taught for twenty years. He works Mothers Against Drunk Driving) and currently as a writer, lecturer, and from India, and many more. broadcaster on language and linguistics, DAMM ( = Drinkers Against Mad Regional dialect words have every maintaining his academic links through Mothers). right to be included in an English an honorary professorship in linguistics at Because these forms are dependent vocabulary count. They are English Bangor. He is the editor of Linguistics on 'bigger' words for their existence, Abstracts and Child Language Teaching you might well decide not to include words, after all- even if they are used and Therapy. Among his recent only in a single locality. But no one publications are Listen to Your Child, them in your count. On the other knows how many there are. Several Who Cares About English Usage?, and hand, you could argue that they are big dictionary projects exist, cata• Linguistic Encounters with Language often more important than the loguing the local words used in some Handicap. His most recent book is the original words - and that the original Cambridge Encyclopedia of Language. of these areas, but in many parts of words may not even be remembered the world where English is a or known (as many people find with mother-tongue or second language, such forms as AIDS). Personally, I there has been little or no research. how much use is still made today of would include them in my word And the smaller the locality, the such early jazz-world words as groovy, count - but some dictionaries do not. greater the problem. Everyone knows hip, square, solid, cat, and have a ball? There are other marginal cases. that 'local' words exist: 'we have our Or how much use is made of the new What would you do with the names of own word for such-and-such round slang terms derived from computers, people, places and things in the here'. Local dialect societies some• such as he's integrated (= organised) world? Should London, Whitehall, times print lists of them, and dialect or she's high res (= very alert, from Paris, Munich, and Spain be included surveys try to keep records of them. 'high resolution'). Which words for in your word coun t? You migh t think But surveys are lengthy and expen• 'being drunk' are now still current: they should - especially knowing that sive enterprises, and not many have canned, blotto, squiffy, jagged, paraly• many of these words are different in been completed. As a result, most tic, smashed ... ? And how do we get other languages (such as M unchen and regional vocabulary - especially that at the vast special vocabulary which Espaiia). However, it isn't usual to used in cities - is never recorded. has not grown up in the drugs world? include them as part ofthe vocabulary There must be thousands of distinc• Word-lovers from time to time make of English, because the vast majority tive words inhabiting such areas as collections, but the feeling always can appear in any language. Whichev• Brooklyn, the East End of London, exists that the items listed are only the er language you speak, if you walk San Francisco, Edinburgh and Liver• tip of a huge lexical iceberg. down Pall Mall, you can refer to pool, none of which has ever where you are by using the words Pall appeared in any dictionary. Mall in your own language. The old The more colloquial varieties of Some marginal cases music hall repartee relied on this English - and slang, in particular • point: also tend to be given inadequate Estimating the vocabulary size of treatment. In dictionary-writing, the English is further complicated by the A: I say, I say, I say. I can speak tradition has been to take material existence of hundreds of thousands of French. only from the written language, and uncertain cases - words which you B: You can speak French? I didn't this has led to the compilers concen• wouldn't feel were part of the know that. Let me hear you speak trating on educated, standard forms. 'central' vocabulary of the language. French. They commonly leave out non• On the other hand, you might well A: Paris, Marseilles, Nice, Calais, standard expressions, such as every• feel unhappy about leaving them out. Jean-Paul Sartre . day slang and obscenities, as well as What would you do with all the the slang of specific social groups, abbreviations that exist, for example? The same applies to the names of such as the army, sport, thieves, A recent dictionary of abbreviated people, animals, objects (such as public school, banking, or medicine. words (the impressive , trains and boats), and so on. Proper Eric Partridge once devoted a whole Initialisms& Abbreviations Dictionary names aren't part of anyone dictionary to this world of 'slang and published by the Gale Research language: they are universal. How• unconventional English'. Some of the Company, 11th edition, 1987) lists ever, it's important to note the usages words it contained were thought to be over 400,000 entries. It includes old where these words do take on special so shocking that for several years and familiar forms such as flu, hi-ft., meanings - as in Has Whitehall said many libraries banned it from their deb, FBI, UFO, NATO and BA. anything about this? Here, Whitehall open shelves! There are large numbers of new means 'the government'; it isn't just a Keeping track of slang, though, is technical terms, such as VHS (the place name. Dictionaries would one of the most difficult tasks in video system), AIDS, and all the usually include this kind of usage in vocabulary study, because it can be so terms from computerspeak (PC, their list. But it's not at all clear how shifting and short-lived. The life• RAM, ROM, BASIC, bit) and space many uses of this kind there are. span of a word or phrase may be only travel (SRB - solid rocket boosters, Fauna and flora present a further a few years - or even months. The OMS - orbital manoeuvring system, type of difficulty. Around a million expression might fall out of use in one etc.) And there are thousands of species of insects have already been social group, and reappear some time coinages which have a restricted described, for example. Which means later in another. Who knows exactly regional currency, such as RAC (= that there must be around a million

12 ENGLISH TODAY No. 12 - OCTOBER 1987 designations available to enable En• glish-speaking entomologists to talk ~ about their subject. How much of this Lexical coverage of three can be included in our word count? The largest dictionaries already in• unabridged US dictionaries clude hundreds of thousands of A hint of the extent to which any given overlap. This figure is not much technical and scientific terms, but dictionary underestimates the total increased even if RH's proper names none of them includes more than a word-stock of English can be obtained are excluded from consideration. fraction of the insect names - usually from the table below, which lists the The same story emerges if pairs of just the most important species. Add bold-face words found as initial items dictionaries are compared. There is an this total to that required for birds, in the entries of three unabridged overlap of 13 between WIll and RH, fish, and other animals, and the American dictionaries (variants later of 11between RH and WEE, and of 10 theoretical size of English vocabulary in the entry's opening line have been between WIll and WEE, suggesting excluded). Of the 48 possible items that, if this sample is representative, the increases enormously. listed, coverage ranges from 70% to average overlapping coverage (as de• In the light of these problems, it 35%. Only nine words appear in all fined by headwords) between any two may not be possible to arrive at a three dictionaries - less than 20% dictionaries might be as low as 25%. satisfactory total for English vocabul• ary. But one thing is plain: the core vocabulary, as reflected in the entry Webscer III sabadillaSabaeanWorld RandomsabasabadillasabaloSabaeanEook House totals cited for such works as the sabakhasabaean2sabadinesaba sabaisabadillasabalotegrass EncyclopediasabalsabanaSabaistsabalosabadininesabaeanlSabadellpalmetto Sabah Sabaism unabridged Oxford English Dictionary or Webster's Third New International, is a considerable underestimate (see Panel 2). These totals focus on a figure of about half a million. However, if we allow in some of the above categories, this figure must be increased by a factor of three or four. I would never want to go below one million, for an estimate of English vocabulary, and with very little persuasion I would readily accept two.

How large is your vocabulary? Sabaoth Sabaoth Sabata There seems to be no more agreement Sabatier about the size of an adult's vocabulary Sabatini than there is about the total number sabathe's cycle of words in English. Estimates do sabaton sabaton indeed vary, as we have seen. Part of sabayon sabayon the problem, I imagine', is what is Sabbat sabbat Sabbat meant by 'educated'. But whether we sabbatarianI Sabbatarian Sabbatarian are educated or not, how can we find sabbatarian2 out the truth of the matter? sabbatarianism Sabbatarianism Sabbatarianism sabbath Sabbath Sabbath We might tape record everything sabbath we said and heard for a month, or a sabbath dayI year, and keep a record of everything sabbath dal we read and wrote. Then we could sabbatharian tabulate all the words, mark which sabbath-day house ones we understood and which we sabbath-day's journey Sabbath-day's journey failed to understand, and count up. sabbathless Sabbathless Sabbathless But life is too short. SabbatWike An alternative, which can be sabbathly sabbath school Sabbath School Sabbath School carried out in a couple of hours, gives sabbatia a fairly good idea. You take a sabbatian1 medium-sized dictionary - one which sabbatian2 contains about 100,000 entries - and sabbatic test your know ledge of a sample of the sabbaticall Sabbatical sabbatical words it contains. A sample of about sabbatical2 2% of the whole, taken from various sabbaticals sections of the alphabet, gives a Sabbatically sabbatically reasonable result. In other words, if Sabbaticalness such a dictionary were 2000 pages Total: 34 22 17 long, you would have a sample of 40 pages. Use the following procedure.

ENGLISH TODAY No. 12 - OCTOBER 1987 13 • It's wise to break this sample down into a series of selections, say of 5 pages each, from different parts of the dictionary. It wouldn't be sensible to P"t of one p,~"~~::y~,~!':;'~mg~~t!'~W:~'~fth, L~ take all 40 pages from the letter U, for Dictionary of the English Language (90,000+ headwords). + = known/used. instance, as a large number of these VaguelyOccasionallyOftenNo Never ++ KNOWNUSED+ + + + + + cablewaycabochoncabotagecabmancabobCaboccaboodlecab-rankcabrioletcabcablestandstich (noun)(abverb)caboosecable visioncabriolecable television cablese words wouldWell begin with un-, and this would hardly be typical. On the other hand, prefixes are an important aspect of English word formation, so we mustn't exclude them entirely. Similarly, it would be silly to include a section containing a large number of scientific words (such as the section containing electro-), or rare words (such as those beginning with X). • One possible sample, which tries to balance various factors of this kind, would take sections of 5 complete pages from each of the following parts of the dictionary: C-, EX-, J-, 0-, PL-, SC-, TO- and UN-. Begin with the first full page in each case - in other words, don't include the very first page of the C section, if the you know or use the word in any of its The results are interesting. Note that heading takes up a large part of the meanings, that will do. (Deciding passive vocabulary is much larger page; ignore the first few EX- entries, how many meanings of a word you than active. This will always be the if they start towards the bottom of a know or use would be another - much case. You will also find that it's easier page; and so on. vaster - project!) to make up your mind about the • Draw up a table of words like the • When you've finished, add up the words you definitely know than the words you frequently use. one in Panel 3. On the left-hand side ticks in each column, and multiply Even allowing for wishful think• write in the headwords from the the total by 50 (if the sample was 2% dictionary, as they appear. Do not of the whole). The total in the first ing, sampling bias, and other such factors, it would seem that some of include any parts of words which the column is probably an underestimate the widely quoted estimates of our dictionary might list, such as cac- or of your vocabulary size. And if you -caine, but do include words with take the first two columns together, vocabulary size are a long way from affixes, such as cadetship alongside the total will probably be an overesti• reality. Comparisons with Shake• cadet, even if the former is listed only mate. speare or other past writers are' meaningless, given the enormous as -ship within the entry on cadet. In This procedure of course doesn't short, include all items in bold face increase in English vocabulary since allow for people who happen to know his day. What I would now very much within an entry. Include phrases or a large number of non-standard idioms (e.g. call the tune). Ignore like to know is (a) whether this words that may not be in the procedure can be tightened up in alternative spellings (e.g. caesarian/ cesanan. dictionary (such as local dialect some way, or whether a better words). If you are such a person, the procedure can be suggested? and (b) • The table has two columns: the figures will have to be adjusted again what range of totals emerge from first asks you to say whether you - bu~ that will be pure guesswork. people of varying backgrounds and think you know the word, from Here are the estimates for the first ages? ET will publish in due course a having heard or seen it used; the two columns, as filled in by a female range of vocabulary estimates from second whether you think you office secretary in her 50s: readers who have tried out the actually use it yourself in your speech procedure for themselves (or, if they or writing. This is the difference prefer, have tried it out on a 'friend'). between passive and active vocabul• WORDS KNOWN If you do send in these details, please ary. Within each column, there are make sure you include data on age, three judgments to be made. For Well Vaguely educational background, and occupa• tion, as well as the dictionary you passive vocabulary, you ask 'Do I 30,050 8,250 know the word well? vaguely? or not used. The results will always be at all?'. For active vocabulary, you 38,300 interesting, and may be surprising. If ask: 'Do I use the word' often? nothing else, it can provide you with a occasionally? or not at all?'. Place a WORDS USED good topic for parties. There really tick in the appropriate column. If you isn't a way of capping such observa• are uncertain, use· the final column. Often Occasionally tions as 'I have an active vocabulary of approximately 38,600 words'. It will You may need to look at the definition 16,300 15,200 or examples given next to the word, be a safe conversation-stopper • before you can decide. Ignore the 31,500 unless, that is, you encounter another number of meanings the word has: if ET reader at the same party. ,F.;[j

14 ENGLISH TODAY No. 12 - OCTOBER 1987