Issue Editor 1 R
Total Page:16
File Type:pdf, Size:1020Kb
DECEMBER 1989VOL. 12 NO. 4 a quarterlybulletin of the IEEE ComputerSociety technicalcommittee on Data Engineering CONTENTS Letterfrom the Issue Editor 1 R. King Problems and Peculiarities of ArabicDatabases 2 A.S. Elmaghraby, S. El—Shihaby, and S. El—Kassas Bayan: A Text DatabaseManagementSystemWhichSupports a Full Representation of the ArabicLanguage 12 A. Morfeq, and R. King Responsa: A Full—TextRetrievalSystem with LinguisticProcessing for a 65—MillionWordCorpus of JewishHeritage in Hebrew 22 V. Choueka IconicCommunication: ImageRealism and Meaning 32 F-S Liu, and J-W Ta! Overview of Japanese Kanji and KoreanHangulSupport on the ADABAS/NATURALSystem 42 V. ishi!, and H. Nishimoto Call for Papers 52 SPECIALISSUE ON NON-ENGLISHINTERFACES TO DATABASES ~ IEEECOMPUTERSOCIETY 4 -~ IEEE Editor-in—Chief, Data Engineering Chairperson, TC Dr. Won Kim Prof. LarryKerschberg MCC Dept. of informationSystems and SystemsEngineering 3500West BaiconesCenterDrive GeorgeMasonUniversity Austin, TX 78759 4400UniversityDrive (512) 338—3439 Fairfax, VA 22030 (703) 323—4354 AssociateEditors Vice Chairperson, TC Prof. Dma Bitton Prof. Stefano Coil Dept. of EiectricaiEngineering Dipartlmento di Matematica and ComputerScience Universita’ di Modena University of iiiinois Via Campi 213 Chicago, iL 60680 41100Modena, Italy (312) 413—2296 Prof. MichaeiCarey Secretary, TC ComputerSciencesDepartment Prof. Don Potter University of Wisconsin Dept. of ComputerScience Madison, WI 53706 University of Georgia (608) 262—2252 Athens, GA 30602 (404) 542—0361 Prof. Roger King Past Chairperson, TC Department of ComputerScience Prof. Sushli,Jajodia campus box 430 Dept. of informationSystems and SystemsEngineering University of Colorado GeorgeMasonUniversity Boulder, CO 80309 4400University Drive (303) 492—7398 Fairfax, VA 22030 (703) 764—6192 Prof. Z. MoralOzsoyogiu Department of ComputerEngineering and Science CaseWesternReserveUniversity Cieveiand, Ohio 44106 (216) 368—2818 Dr. Sunil Sarin Distribution XeroxAdvancedInformationTechnology Ms. Lori Rottenberg 4 CambridgeCenter IEEE ComputerSociety Cambridge, MA 02142 1730 Massachusetts Ave. (617) 492—8860 Washington, D.C. 20036—1903 (202) 371—1012 DataEngineeringBuiletin is a quarteriypubiicatlon of Membership in the Data EngineeringTechnicalCommittee the IEEE ComputerSocietyTechnicalCommittee on Data is open to individuals who demonstratewillingness to its of interestincludes: data structures in Engineering . scope activelyparticipate the variousactivities of the TC. A and models,accessstrategies,accesscontroltechniques, member of the IEEE ComputerSociety may Join the TC as a databasearchitecture,databasemachines,inteiligentfront full member. A non—member of the ComputerSociety may ends, massstorage for very largedatabases,distributed Join as a participatingmember, with approvalfrom at least databasesystems and techniques,databasesoftwaredesign one officer of the TC. Both full members and participating and Implementation,databaseutilities databasesecurity members of the TC are entitled to receive the quarterly and relatedareas. bulietin of the TC free of charge, until furthernotice. Contribution to the Bulletin is herebysolicited. News items, letters,technicalpapers, book reviews,meetingpreviews, summaries, casestudies, etc., should be sent to the Editor. All letters to the Editor wiii be considered for publication unlessaccompanied by a request to the contrary. Technical papers are unrefereed. OpinIonsexpressed in contributions are those of the mdi viduaiauthorrather than the officlaiposition of the TC on Data Engineering, the IEEE ComputerSociety, or orga nizations with which the author may be affiiiated. Letterfrom the Editor Severalmonths ago, Won Kimasked me to put together the December 1989issue. Ini tially, we agreed on graphicalinterfaces to databases as the topic. As I beganlooking into the literature and trying to figurewhatpapers to invite, I noticed that there wasvery little out therethat had to do withnon-Englishinterfaces to databases. So I set out to find a broadrange of papers, all dealingwithinterfaceproblemsthatAmerican andEuropean researcherswouldfindunusual. Theindividualswhocontributed to this issuewereasked to put in a lot of effort in a very shortamount of time. A few of them haddifficultywriting in English. Therewere also communicationproblems for the authorslocatedoutside the USA. All of the authors deserve a verywarmthanks. There are two papersdealing withArabic. The firstpaper, by Elmaghraby,El-Shihaby, and El-Kassas, gives an overview of why Arabicpresentsunusualproblems for the developer of interfaces, and whyEnglish-basedsoftware and hardwaremay not be easily adapted to handleArabic. The secondpaperrepresents the Ph.D.thesiswork of Au Mor feq and describes an effort to develop text databasemanagementtechniquesspecifically suited for Arabic. The thirdpaper also deals with textual data. It is by YaacovChoueka and describes a very aggressiveeffort at developing a database of Hebrew texts. This effort has pro ducedresearchcontributionsthat are of use to developers of textretrievalsystems in gen eral, notjust to developers of Hebrew-basedsystems. Thefourthpaper is by FangShengLiu and Ju Wei Tai, andconcerns the development of graphicalinterfaces. It suggests that a useful way of constructing and categorizing graphicaltokens can be based on Chinesecharactertheory. Oneinterestingaspect of this paper is that it is not dedicated to Chinese-basedinterfaces. The fifth paper (by Yoshioki Ishii and HidekiNishimoto)describes the support of Japanese and Koreandatabaseinterfaces. It gives an overview of why the Japanese and Koreanlanguagespresentnovelproblems, and then describes the approach they have taken in developingtheirsystem. I hopethat the readers of thisissuefindthesepapers to be unusual andinteresting. RogerKing December, 1989 1. Problems and Peculiarities of ArabicDatabases A. S. Eliuaghraby1, S. E1-Shihaby2, and S. E1—Kassas3 ABSTRACT This paper presents a brief overview of Arabiclanguagedatabases, their problems, solutions, and directions for the future. Experience is based on implementation of an Arabizationscheme and severaldatabaseapplicationsusingstandardDBMSpackages and programminglanguages.Most of theseproblems are due to the fact that Arabization is not integrated in the design of the hardware and systemprograms. 1. INTRODUCTION The need for ArabicDatabases has beenrecognizedsince the introduction of computers in the Middle East. Modernelectroniccomputers were first introduced to Egypt in the 1950’s starting by the IBM model 1620. Currently all types of computers exist, and are Arabized, with the exception of some high end models. Arabizationschemes for computers are key to understandingproblems of Arabic Databases. It is alsobeneficial to briefly get acquaintedwithsomepeculiarities of the Arabic language. The Arabiclanguage is written from right to left using the Arabiccharacter set with connectedscript.Twentyeightcharacterscompose the Arabic set, howevereach character has multipleshapes.Displaying an Arabicword requiresknowledge of shapeselection of each letter based on the letterspositions in the word. The lack of endogenouscomputerdesigns in the Arabworld have forcedArabization to follow a modification and patchworkapproach.PeripheralArabization has been the most popularapproach in the earlyyears and is used for mainframe and minicomputerswhile microcomputerArabization has been approached with a variety of schemes. Thispaperpresentssome of the problemsfacingDatabasedesign and Databaseinterfaces for Arabicapplications.Some of the necessarybackgroundrelated to Arabiclanguage and Arabizationschemes is also included. One major issue in Arabiclanguage that is briefly presentedwithoutdetail in this paper is diacritics(al-tashkeel) - which has been dropped in mostmodernArabicpublications -- diacritics are mainlyused to improvepronunciation. 1 EMA~SDept.,University of Louisville,Kentucky, USA40292. 2 University of Alexandria,Alexandria,EGYPT. EB Dept.,EindhovenUniversity of Technology, the Netherlands. 2. 2. PROBLEMS OF ARABICDATABASES 2.1. Peculiarities of Arabic and Arabization Differences amongnaturallanguages have caused performancedegradation in many computerenvironments,includingdatabases.However,softwaredesigners,most of the time, includeconsiderations for the differences in Latin-basedlanguages. They do constructlayers to allow the update of the character set -- IBM’sMVS OS allows thatwithin the whole OS interface. This is convenient when applications do not use “clever” techniques and communicatedirectlywithsystemperipherals,such as in the case of Focus on IBMsystems. When the linguisticdifferencesinclude more than minorvariations in the character set, such as in Arabization, serious problems occur. Operating system designers did not anticipate,whendesigningtheirsupervisorcallshandlingterminals, thatentering a character may alter the shape of an alreadydisplayedcharacter. In order to face these problems designers will assist in manoeuveringaround the systemcapabilities to reachtheseneeds. In an attempt to build an Arabicdatabase, the designer is facedwith the obviousproblems of using a non-Englishcharacter set in addition to somepeculiarities that relate to Arabic language and Arabizationschemes. 2.1.1. Orientation The Arabiclanguage is written in a right to left orientation.When the first IBMArabic terminals(XCOM,XCOM1, and XCOM2)wereintroducedtheyallowedwritingcharacters in both orientations;left-to-right (R-L), and right-to-left (L-R). However,because of technicalproblems,