Orthographic and Morphological Problems in Headword Identification, Selection and Presentation in ALLEX
Total Page:16
File Type:pdf, Size:1020Kb
http://lexikos.journals.ac.za Orthographic and Morphological Problems in Headword Identification, Selection and Presentation in ALLEX Herbert Chimhundu, University of Zimbabwe, Harare, Zimbabwe Abstract: This article discusses aspects of an on-going lexical computing project at the University of Zimbabwe, known as ALLEX, the acronym for African Languages Lexical Project. ALLEX is a collaborative, multi-faceted, long-term, computer-aided lexicography project that is intended to produce a series of dictionaries, glossaries and other language reference works in the indigenous languages of Zimbabwe, starting with the first ever monolingual Shona dictionary, which is already at an advanced stage of preparation, and which will also be the first corpus-based dictionary in Zimbabwe. The article confines itself to problems relating to word formation processes in Shona, specifically with reference to the lexicographers' need to ensure consistency in (i) identifying word ) 1 fonns as headwords in the running texts of the corpus, (ii) selecting from among these headwords 1 0 to decide which ones to enter in the dictionary, and (iii) presenting the entries in the standard 2 d orthography. e t a First, an outline is given of the project's baCkground, objectives, priorities, guidelines and d ( r work in progress. The article then focuses on specific problems encountered, and discusses these, e h s and some of the solutions, in the light of advances in computer lexicography, with particular i l b reference to concordancing. It will become evident that the problems encountered by the ALLEX u P Team have to do with the unlimited capacity of the Shona language's basic and derivational word e h t fonnation processes, which it shares with the other languages in the Bantu family. Therefore, it y b will be suggested that these problems, and the solutions that are being worked out, have much d e wider implications that go beyond Shona and Zimbabwe. t n a r g Keywords: AFRICAN LANGUAGES, HEADWORD IDENTIFICATION, HEADWORD e c n PRESENTATION, HEADWORD SELECTION, LEXICAL PROJECT, LEXICOGRAPHY, e c i l MORPHOLOGICAL PROBLEMS, ORTHOGRAPHIC PROBLEMS r e d n Opsomming: Ortografiese en morfologiese probleme by lemma-identifika u y sie, -seleksie, en -aanbieding in ALLEX. Hierdie artikel bespreek aspekte van 'n voort a w e gesette leksikale rekenariseringsprojek by die Universiteit van Zimbabwe, bekend as ALLEX, die t a akroniem vir African Languages Lexical Project. ALLEX is 'n gesamentlike, veelfasettige, lang G t tennyn-, rekenaargesteunde leksikografieprojek wat hom dit ten doel stel om 'n reeks woorde e n i boeke, woordelyste en ander taalnaslaanwerke in die inheemse tale van Zimbabwe te produseer. b a S y b d e c u d o r p e R http://lexikos.journals.ac.za Orthographic and Morphological Problems in Headword Identification 189 Die projek begin met die eerste eentalige Shonawoordeboek, wat reeds in 'n gevorderde stadium van bewerking is en ook die die eerste korpusgebaseerde woordeboek in Zimbabwe sal wees. Die artikel beperk hom tot probleme in verband met woordvormingsprosesse in Shona, spe sifiek met verwysing na die leksikograaf se behoefte om eenvormigheid te handhaaf by (i) die identifikasie van woordvorme as lemmas in die lopende teks van die korpus, (ii) die seleksie uit hierdie lemmas om te besluit watter om in die woordeboek in te sluit, en (iii) die aanbieding van hierdie inskrywings in die standaardortografie. Ten eerste word 'n skets gegee van die projek se agtergrond, doelstellings, prioriteite, riglyne en die werk wat aan die gang is. Dan fokus die artikel op spesifieke probleme wat teegekom is, en bespreek hulle en sommige van die oplossings in die lig van rekenaarleksikografie, met besondere verwysing na konkordansieskepping. Dit sal duidelik word dat die probleme. wat deur die ALLEX-span teengekom is, te doen het met die Shonataal se onbeperkte vermoe tot basiese woord vorming en woordvorming deur afieiding, 'n eienskap wat dit met die ander tale in die Bantoe familie deel. Daarom word daar aangedui dat hierdie probleme en die oplossings wat uitgewerk word, baie wyer implikasies het wat verder reik as Shona en Zimbabwe. Sleutelwoorde: AFRIKATALE, LEKSIKALE PRO]EK, LEKSIKOGRAFIE, LEMMASELEK SIE, LEMMA-AANBIEDING, LEMMA-IDENTIFIKASIE, MORFOLOGIESE PROBLEME, ORTO GRAFIESE PROBLEME 1. Introduction ) 1 1 0 2 This article discusses only a few aspects of an on-going lexical computing pro d e ject at the University of Zimbabwe, coordinated by the present writer and t a d known as ALLEX, the acronym for African Languages Lexical Project. ALLEX ( r e is a collaborative, multifaceted, long-term computer-aided lexicography project h s i that is intended to produce a series of dictionaries, glossaries and other refer l b u ence works in the indigenous languages of Zimbabwe. The first volume P e planned is a monolingual Shona dictionary, which is already at an advanced h t stage of preparation. The article discusses some of the problems encountered y b during work in progress, but it does not go into theoretical, computational, d e t defining and editorial issues, all of which, it is hoped, will be handled at n a r appropriate stages by different members of the ALLEX Team. g e The current article confines itself to p~oblems relating to the word forma c n e tion processes of Shona and its adopted system of spelling and word division, c i l insofar as these are relevant to the design of the dictionary, particularly at the r e d macrostructural level, and specifically with reference to the lexicographers' n u need to ens~.tre consistency in how they identify word forms as headwords in y a the running texts of the corpus, how they select from among all these head w e t words to decide which ones to enter in their dictionary, and then how to pre a G sent these entries in the standard orthography. t e What follows below is partly based on the writer's previous experience in n i b compiling the Addendum for Hannan's Standard Shona Dictionary (1981), but the a S y b d e c u d o r p e R http://lexikos.journals.ac.za 190 Herbert Chimhundu greater part is based on more recent experiences in ALLEX. From the project's technical reports (Chimhundu 1992c, 1993a and 1993b), related documents and correspondence, an outline is drawn of the project's background, objectives, priorities, guidelines and work in progress. The article then focusses on spe cific problems relating to headword identification, selection and presentation that are relevant to the methodology that is being used to produce the first corpus-based dictionary in Zimbabwe, which will also be the first monolingual dictionary in Shona, the language spoken by at least three-quarters of the pop ulation. It will be suggested that the problems encountered by the ALLEX Team have to do with the unlimited capacity of the Shona language's basic and derivational word formation processes, which it shares with the other lan guages in the Bantu family. Therefore, the problems encountered and the solutions being worked out in ALLEX at present have much wider implications that go beyond Shona and Zimbabwe. These problems range from how the conjunctive spelling further com pounds the problem of identifying word forms generally; what criteria to use to determine qualification for selection of monomorphemic and multimor phemic word forms as lexical entries; how to create concordances or, more specifically, how to select key-words for concordancing of running texts; how to use the concordances for identification and selection of lexical items by the agreed criteria; and then, how to present these in the dictionary without vio ) 1 lating the rules of the orthography in Standard Shona and the lexicographical 1 0 conventions already established in existing dictionaries. 2 d e Defining as such will not be discussed, as this is a different problem alto t a gether which, it is hoped, the writer will come back to in a follow-up article. d ( r Obviously, because of the very close similarity in the morphosyntactic e h s structure of the languages of the Bantu family, and despite the fact that some of i l b them use conjunctive while others use disjunctive spelling, the problems that u P are being encountered and the solutions that are being worked out in ALLEX e h t should generate much wider interest. It should also be of much greater signifi y b cance and application in the region, especially since, to the best knowledge of d e the writer, ALLEX is a pioneering collaborative project in computer-aided lexi t n a cography in the indigenous languages of the region, and perhaps of Africa as a r g whole, that is unique in terms of both design and methodology. e c n e c i l r e 2. The ALLEX Project d n u y a When the ALL EX Project was launched in September 1992 there was neither a w e language policy nor a lexicographical policy in Zimbabwe, and bilingual dic t a tionaries for second-language users predominated, as they still do, in the whole G t e region: n i b a S y b d e c u d o r p e R http://lexikos.journals.ac.za Orthographic and Morphological Problems in Headword Identification 191 The indigenous African languages display a lack of comprehensive monolingual lexicographical description (Gouws 1990: 55). The timing of the rather belated Zimbabwean effort has been fortuitous in the sense that the launch of ALLEX was preceded by advances in computer tech nology. These advances concerned specifically automated text processing, that are now providing African language lexicographers with at least a theoretical chance to catch up, that is, if we ignore the futuristic automated dictionary that is already being talked about in the technolOgically advanced countries as a potential rival for the conventional form of the dictionary that we all know of as a book.