English Linguistic Guide

ENGLISH LINGUISTIC GUIDE Despite the remarkable and irreversible changes that have come up on the English language since the AngloSaxon p erio d it has not yet reached a p oint of p erfection and stability suchaswe sometimes asso ciate with Latin of the Golden Age the language of Virgil Horace and others DR ROBERT BURCHFIELD The English Language in The Oxford Guide to the English Language What a patchwork has b een our old saxon by the bitter frost that nipp ed its early budding and the constant habit of b orrowing thence resulting the learned among usas well as the unlearned though in very dierentwaysare constantly made to feel The English language as wehaveitnow is not so much a coherentgrowth as a disturb ed organism PROF JOHN STUART BLACKIE Gaelic Self Taught CONTENTS ENGLISH ORTHOGRAPHY Numb er of sp ellings Sp elling number Language Co de Frequency of Sp elling Sp elling Diacritics Reverse transcriptions Sp elling columns Transcriptions for corpus typ es Transcriptions for English lemmas Sp ellings for English headwords Sp ellings for syllabied headwords Transcriptions for wordforms Sp ellings for plain wordforms Sp ellings for syllabied wordforms ENGLISH PHONOLOGY Numb er of pronunciations Pronunciation number Status of pronunciation Phonetic transcriptions Computer phonetic character sets Plain transcriptions Syllabied transcriptions Stressed and syllabied transcriptions Example transcriptions Phonetic patterns ENGLISH MORPHOLOGY Morphology of English lemmas How to segment a stem Typ es of analyses The Derivation The Comp ound The Derivational Comp ound The NeoClassical Comp ound The NounVerbAx Comp ound How to assign an analysis The NounVerbAx Comp ound Status and language co des Derivationalcomp ositional information Analysis typ e co des Immediate segmentation Complete segmentation at Complete segmentation hierarchical Other co des Morphology of English wordforms Inectional features Typ e of ection Inectional Transformation ENGLISH SYNTAX Word class co des letters or numbers Sub classication Y or N Sub classication nouns Sub classication verbs Sub classication adjectives Sub classication adverbs Sub classication numerals Sub classication pronouns Sub classication conjunctions ENGLISH FREQUENCY Frequency information for lemmas and wordforms Frequency information from written and sp oken sources Written corpus information Sp oken corpus information Frequency information for COBUILD corpus typ es Frequency information for COBUILD written corpus typ es Frequency information for COBUILD sp oken corpus typ es ENGLISH ORTHOGRAPHY Detailed and varied information is available on the ortho graphic forms of headwords and wordforms You can cho ose from a range of transcriptions they can b e syllabied or un syllabied they can include or omit diacritics as explained below or in some cases they come with the order of the letters reversed or with the letters sorted alphab eticallyIn addition there are columns whichtellyou the number of letters or syllables a particular transcription contains This flex window is the menuyou see for a lemma or a wordform lexicon when you cho ose the Orthography option of the rst ADD COLUMNS menu ADD COLUMNS Number of spellings Spelling number N Language code Frequency of spelling Spelling TOP MENU PREVIOUS MENU NUMBER OF SPELLINGS This option in the ADD COLUMNS menu is a column which tells you how manyways eachlemmawordform or abbrevi ation according to the typ e of lexicon you are using can b e sp elt For the verb lemma generalize this column has the value which means there are twopossibleways of sp elling it Unless you construct a restriction on your lexicon gener alize will o ccur twice one row using the form generalizethe other using the form generalise This column is particularly useful when you wanttoidentify words whichhave sp elling variants To exclude from your english linguistic guide lexicon all items whichonlyhave one p ossible sp elling con taining instead those whichcanbespeltina number of ways you can construct an expression restriction which simply states that the numb er of sp ellings must b e greater than OrthoCnt The flex name and description of this column are as follows OrthoCnt Number of spellings OrthoCntLemma SPELLING NUMBER Just as the very rst available ADD COLUMNS option is a number which uniquely identies each lemma or wordform according to the typ e of lexicon you are using so this column uniquely identies every sp elling to b e found for each lemma or wordform If you are using a lemma lexicon the sp elling variants are given in the form of headwords with or without syllable markers For example the verb generalize has two sp ellings one is the form generalize and another is the form generalise These have the sp elling numb ers and resp ectivelyIfyou use a syllabied stem representation in place of the plain headword representation sp elling takes the form general ize and sp elling the form generalise This means you can use the universal sequence number to identify a particular lemma or wordform dep ending on the typ e of lexicon you are using and then the sp elling number to identify the dierent individual sp ellings used for each lemma Usually no preference is indicated by the sp elling numb ers eachvariant has generalize is as valid as gener aliseHowever the numb er sp elling is always an acceptable British from and any American variants are always given a higher numb er Thus the lemma monologue has the British form monologue as its numb er sp elling and the American form monolog as its numb er sp elling You can nd out whether a sp elling is British or American by using the Or thoStatus column describ ed b elow One imp ortant p oint to rememb er is that the sp elling number can b e used to eliminate unwanted rows from your lexicon If you only want to see one sp elling for each lemma or Sp elling number wordform you should construct a restriction which states that only rows with a sp elling numb er equal to are to b e included in the form OrthoNum If you dont do this you usually end up with lexicons that are to o long b ecause they needlessly rep eat certain pieces of information Takethe example generalize again only this time imagine you wantto know its pronunciation rather than the various ways it can b e sp elt You create a lexicon with three columns one giving the sp elling numb er one giving the orthography of the stem and one giving the pronunciation of the headword Without the restriction flex returns tworows for generalize Sp elling Number Headword Pronunciation generalize dZEnrl aIz generalise dZEnrl aIz generalize dZEnrla Iz generalise dZEnrla Iz This is unnecessary since you are interested only in the p ossible pronunciations The extra row merely gives you a sp elling variant while the pronunciations remain the same When you include the restriction OrthoNum however only the rows with the numb er sp elling are included Sp elling Number Headword Pronunciation generalize dZEnrl aIz generalize dZEnrla Iz And of course the more lemmas your lexicon contains the greater the numberofeliminated lines b ecomes simply as a consequence of adding this imp ortant restriction If you are particularly interested in sp elling variation then do not add this OrthoNum restriction that wayyou get to see all the variant orthographic forms of each lemma Otherwise whenever you just want to use a simple ortho graphic transcription as a means of representing the lemma in your lexicon always rememb er to insert it The flex name and description of this column are as follows OrthoNum Spelling number OrthoNumLemma english linguistic guide LANGUAGE CODE For every dierent sp elling there is a co de which tells you whether it is an acceptable British form B or whether it is only ever an American form A This applies to lemmas and wordforms A British sp elling is always the rst one given that is its sp elling numb er is always while one which only o ccurs in American English is never the rst form and always has a sp elling numb er greater than Sp elling Status Example typ e co de British B adviser American A advisor Table Orthographic status co des for English sp ellings The flex name and description of this column are as follows OrthoStatus Status of spelling OrthoStatusLemma FREQUENCY OF SPELLING There are gures available which tell you how frequently each sp elling of each lemma or wordform o ccurs in the cobuild corpus along with deviation gures whichgive a range of error for each frequency They dier from the main frequency gures in that they are sp ecic to one sp elling whereas the frequency columns prop er refer to the more general frequency counts for the whole lemma or for eachwordform To arrive at these gures a count has to b e made of the numb er of times each string o ccurs in the current million word version of the cobuild corpus This lets you see that the string b eauty for example o ccurs times and that truth o ccurs times and these gures are the sp elling frequencies for the wordforms which are sp elt that way Usually but not always this string frequency is the same as the wordform frequency and its then p ossible to formulate sp elling frequencies for each lemma This simply means adding together the frequencies for eachwordform in each inectional paradigm Thus the sp elling frequency for the Frequency of Sp elling lemma b eauty is the sp elling frequency for the wordform b eauties plus

English Linguistic Guide

Department of English and American Studies English Language And

Problem of Creating a Professional Dictionary of Uncodified Vocabulary

Towards a Conceptual Representation of Lexical Meaning in Wordnet

Inferring Parts of Speech for Lexical Mappings Via the Cyc KB

Symbols Used in the Dictionary

DICTIONARY News

Japanese-English Cross-Language Headword Search of Wikipedia

Download in the Conll-Format and Comprise Over ±175,000 Tokens

Phonemic Similarity Metrics to Compare Pronunciation Methods

Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars

Prefixes and Suffixes

Also Called Back-Derivation, Retrograde Derivation Or Deaffixation) Is Often Described As One of the Minor Word-Formation Processes