Celex a Guide for Users Cccccccccc

CELEX A GUIDE FOR USERS GAVIN BURNAGE CELEX CENTRE FOR LEXICAL INFORMATION C C C C C C C C C C CCCCCC C CCCCCCCCCCCCC C C C CCCCCCCCCCCCCCCC CCCCCCCCCC CC C CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCC CELEX A GUIDE FOR USERS CCCCCCCCCC CELEX CENTRE FOR LEXICAL INFORMATION Max Planck Institute for Psycholinguis tics Wundtlaan XD Nijmegen The Netherlands Telephone Fax Electronic mail internet celexmpinl First published in the Netherlands in c CELEX CENTRE FOR LEXICAL INFORMATION ISBN No part of this publication may b e repro duced stored in a retrieval system or transmitted in anyformorbyany means electronic mechanical photo copying recording or otherwise without the prior written p ermission of the publisher Typ eset using the T X computer typ esetting system E Printed by drukkerij SSN Nijmegen T X is a trademark of the American Mathematical So ciety E INTRODUCTION There can b e no doubt that lexicographyisa very dicult sphere of linguistic activity Many lexicographers have given vent to their feelings in this resp ect Perhaps the most colourful of these opinions based on a lexicographers long exp erience is that of JJ Scaliger thth cent who says in ne Latin verses that the worst criminals should neither b e executed nor sentenced to forced lab our but should b e condemned to compile dictionaries b ecause all the tortures are included in this work LADISLAVZGUSTA Manual of Lexicography The s will one daybeseenasawatershed in lexicography the decade in which computer applications b egan to alter radically the metho ds and the p otential of lexicography Gone are the days of painstaking manual transcription and sorting on pap er slips the future is on disk in the form of vast lexical databases continuously up dated that can generate a dictionary of a given size and scop e in a fraction of the time it used to take DAVID CRYSTAL The Cambridge Encyclopedia of Language CONTENTS DATABASES AND LEXICONS Why use a database LEXICON TYPES Dutch Lemmas DutchWordforms Dutch Abbreviations Dutch INL Corpus Typ es English Lemmas English Wordforms English COBUILD Corpus Typ es German Lemmas German Wordforms German Mannheim Corpus Typ es DATABASES AND LEXICONS This intro duction tries to do two things In the rst section for those who arent familiar with the ideas and p ossibilities of databases and lexicons there is a description of the way in which a computer database and lexicon is likeand more imp ortantly unlikea traditional pap er dictionaryIfyoure already familiar with such things you mayliketoskipahead to the second section where there is a description of eachof the main lexicon typ es available to you in flexFundamen tal to this description is the dierence b etween wordforms the words we use in everydayspeech and writing and lem mas words used to represent families of wordforms in the same wayasboldtyp e dictionary headings whichtakethe form of stems or headwords Since the linguistic information available to you dep ends on the typeofyour lexicon you should makesureyou understand the dierences b etween the various lexicon typ es b efore b eginning your work And when you start work with flex the sp ecial program which helps you build and use your lexicons youll b e b etter o for having read these sections carefully In the third and last section of this intro ductory chapter you can nd out how celex using lo cal national and international to log into computer networks WHY USE A DATABASE Since we are dealing with words wecanstartoby thinking of databases in terms of a pap er dictionary A book likethe Van Dale Gro ot Woordenb o ek der Nederlandse Taal is essen tially a long list of words with information supplied alongside eachword The key to a dictionary is the alphab etical order of its word entries you can only lo ok up one particular word at a time and examine the information given for it If youve got time you can lo ok at everypagetondall the words with a certain grammatical co de or pronunciation but quite understandably most p eople dont do this unless theyre really desp erate In its simplest form a database can b e like a dictionary just alistofwords and some information alongside eachword introduction The rst imp ortant dierence b etween a computer database and a pap er dictionary is that the database uses dierent columns to store separate typ es of information whereas the dictionary uses one paragraph of text and marks dierent sorts of information within that text by using dierenttyp e faces and co ding systems or by giving the information in a particular order Dictionary text is xed once it is printed You cant movebitsofanentry around or miss them out you are presented with everything at once and you mayhave to read a lot of irrelevant information b efore you nd what youre lo oking for The columns whichmake up a database are much more regimented but that paradoxically is what gives a database its exibility Eachtyp e of information keeps strictly to its own dedicated place which means its easier for the computer to lo cate and serve up one individual item or several particular items relating to eachword that interests you So you can lo ok up a word and its word class co de and pronunciation say without even having to glance at all the other information The diagram opp osite is a simple representation of how information is held in a database Headword Class Phonetics aback ADV bk abacus N bks abandon N bndn abandon V bndn abandoned A bndnd abandonment N bndnmnt abase V beIs abasement N beIsmnt abash V bS abate V beIt The crucial dierence b etween a database and a dictionary is the exibility that a computer can achieve with the prop erly dened rows and columns you can gather together dierent parts of the database and display the information in anyway you like This illustration shows you three vertical columns which are entitled Headword Class and Phonetics and ten horizontal rows each of which displays information for each headword under the correct column heading Arow Why use a database thus contains every typ e of information for one word while acolumncontains one sp ecic typ e of information for every headword The illustration is of course only a very simple example To get an idea of what the whole celex Dutch English or German database mightlooklike imagine three hundred or so more column headings added to the right hand side and ahundred thousand or so more rows added at the b ottom This diagram would then represent a small part of the top left corner of an enormous grid packed with lexical information Exp erts have calculated that if you printed out the rest of this table in full you would end up with a piece of pap er approximately m wide and km long so you could probably walk round it in just under an hour Using flex which itself uses a database managementsystemtoaccess the information in the grid you can extract tiny bits of information or long and detailed lists just as you please When you create a lexiconyoure essentially creating a little dictionary designed to your own sp ecications Unlike a dictionaryyou can use keys other than the head word when you lo ok something up in a database On a simple level this means you can lo ok up the verb walk instead of the noun walk On another level it means that you can get a list of all the verbs in the database excluding all the other words whicharenotverbs The individual printed for eachword in a dictionary are xed but the paragraphs corresp onding rows in a computer database can b e moved ab out and rearranged just as you want them So its p ossible to create a lexicon like the one illustrated b elowby using flex restrictions You simply state that you want to see all the words whichhavetheword class co de V and you can then get as much information as you likeabouttheverbs in your list The example b elowshows a list of verbs with their pronunciations Headword Transcription abandon bndn abase beIs abash bS abate beIt Since youve sp ecied that you only wantverbs in your list introduction theres no need to put the word class co de column on display The computer uses it in preparing the list but you dont have to lo ok at it youd just get a list of Vs The p ossibilities for creating all sorts of lexicons are seemingly endless You have hundreds of columns to cho ose from most of whichcontain information you mightwanttoinspectonyour screen Other columns contain information which can b e used to control what is shown on your screen or in your le the word class column in the illustration ab ove for example or the Inectional features columns under the morphology of Dutch wordforms which simply sayyes when a wordform do es have a particular inectional feature and no when it do esnt A screen display of those columns isnt particularly interesting butaleofwordforms created using the information they contain maywell b e very interesting And there are still more p ossibilities If you want to build up a lexicon whichcontains words with say certain phonetic features in common then flex lets you do it with the help of the pattern matcherFor example you mightwant to see the words whichcontain in a noninitial p osition syllables b eginning with a dental plosiveordental fricative The required pattern is tdTDwhic h when applied to a syllabied phonetic transcription column tells flex to nd transcriptions which consist of zero or more characters of any sort characters followed by a syllable marker followed by one of the dental phonemes t d T or D followed by zero or more characters of any sort The resulting

Celex a Guide for Users Cccccccccc

Robust Ontology Acquisition from Machine-Readable Dictionaries

Lemma Selection in Domain Specific Computational Lexica – Some Specific Problems

Etytree: a Graphical and Interactive Etymology Dictionary Based on Wiktionary

Illustrative Examples in a Bilingual Decoding Dictionary: an (Un)Necessary Component

And User-Related Definitions in Online Dictionaries

Problem of Creating a Professional Dictionary of Uncodified Vocabulary

From Monolingual to Bilingual Dictionary: the Case of Semi-Automated Lexicography on the Example of Estonian–Finnish Dictionary

A Framework for Understanding the Role of Morphology in Universal Dependency Parsing

Towards a Conceptual Representation of Lexical Meaning in Wordnet

Inferring Parts of Speech for Lexical Mappings Via the Cyc KB

Symbols Used in the Dictionary

DICTIONARY News