ADDIS ABABA UNIVERSITY SCHOOL of GRADUATE STUDIES Develop

ADDIS ABABA UNIVERSITY SCHOOL of GRADUATE STUDIES Develop

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES Develop an Audio Search Engine for Amharic Speech web Resources Arega Hassen Mohammed Advisor: Solomon Atnafu (PhD) A Thesis Submitted to the School of Graduate Studies of the Addis Ababa University in partial fulfillment for the Degree of Master of Science in Computer Science Addis Ababa, Ethiopia October, 2019 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES Arega Hassen Mohammed Advisor: Solomon Atnafu (PhD) This is to certify that the thesis prepared by Arega Hassen Mohammed, titled: Design and Development of a Model for an Audio Search Engine for Amharic Speech Web Resources and submitted in partial fulfillment of the requirements for the Degree of Master of Science in Computer Science complies with the regulations of the University and meets the accepted standards with respect to originality and quality. APPROVED BY: EXAMINING COMMITTEE: 1. Advisor: Dr. Solomon Atnafu, ____ ____________________ 2. Examiner: __________________________________________ 3. Examiner: ___________________________________________ i ABSTRACT Most general purpose search engines like Google and Yahoo are designed bearing in mind the English language. As non-resource rich languages have been growing on the web, the number of online non-resource rich speakers is enormously growing. Amharic, which is a morphologically rich language that has strong impact on the effectiveness of information retrieval, is one of the non-resource rich languages with a rapidly growing content on the web in all forma of media like text, speech, and video. With increasing number of online radios, speech based reports and news, retrieving Amharic speech from the web is becoming a challenge that needs attention. As a result, the need to develop speech search engine that handles the specific characteristics of the users’ Amharic language query and retrieves Amharic languages speech web documents becomes more apparent. In this research work, we develop an Audio Search Engine for Amharic speech Web Resources that enables web users for finding the speech information they need in Amharic languages. In doing so, we have enhanced the existing crawler for the Amharic speech web resources, transcribed the Amharic speech, indexed the transcribed speech and developed query preprocessing components for user text based query. As base line tools, We have used open source tools (JSpider, and Datafari) for web document crawling, parsing, indexing, ranking and retrieving and sphinx for speech recognition and transcription. To evaluate the effectiveness of our Amharic speech search engine, precision/recall measures were conducted on the retrieved speech web documents. The experimental results showed that the Amharic speech retrieval engine performed 80% precision on the top 10 results and a recall of 92% of its corresponding retrieval engine. The overall evaluation results of the system are found to be promising. Key Words: audio search engines, audio information retrieval, Information Retrieval in Amharic language, Speech Crawler, Amharic speech Identification. ii Dedication -To my grandmother Zewdie Endeshaw Yalew and -To all people who have played a role in shaping me into the person I am today. iii Acknowledgement First and for most I would like to thank God for giving me health and patience in completing my thesis work. Next to this, I wish to express my sincere gratitude to my advisor Dr. Solomon Atnafu for his continuous and constructive advice and guidance throughout this work. Since the period of this thesis work, he has been showing me the way I have to go through and has also provided me important resources, such as related research papers. He has been also tackling with me in showing the route when we encounter challenges in relation to tools. I would like to thank Mekonnen Assefaw and Sara Abebe for reviewing and offering crucial feedbacks. I am also highly thankful to Hafte Abera for his advice to work on this problem area, my classmate Masereshaye and Tehitena, for their technical support and advice. I also need to express my special thanks to my friend Dani Bekele. Since I cannot list all, I thank all who helped me in the accomplishment of my work directly or indirectly. iv Table of Contents ABSTRACT ...................................................................................................................................................... ii Acknowledgement ....................................................................................................................................... iv LIST OF FIGURES ......................................................................................................................................... viii LIST OF TABLES ............................................................................................................................................. ix LIST OF ALGORITHMS .................................................................................................................................... x Acronyms and Abbreviations ....................................................................................................................... xi Chapter One: Introduction ............................................................................................................................ 1 1.1 Background ................................................................................................................................... 1 1.2 Motivation ..................................................................................................................................... 3 1.3 Statement of the Problem ............................................................................................................ 5 1.4 Objectives...................................................................................................................................... 6 1.5 Methods ........................................................................................................................................ 6 1.6 Scope and Limitations ................................................................................................................... 7 1.7 Application of Results ................................................................................................................... 7 1.8 Organization of the Thesis ............................................................................................................ 8 Chapter Two: Literature Review .................................................................................................................. 9 2.1 Introduction ....................................................................................................................................... 9 2.2 Information Retrieval ........................................................................................................................ 9 2.3 Information Retrieval from Audio Documents ................................................................................ 12 2.3.1 Retrieval Models ................................................................................................................. 13 2.4 Segmentation .................................................................................................................................. 18 2.4.1 Audio Classification ......................................................................................................................... 19 2.5 Speech Recognition ......................................................................................................................... 20 2.6 Feature Extraction ........................................................................................................................... 24 2.6.1 Mel-Frequency Cepstral Coefficients .................................................................................. 26 2.6.2 MPEG based features .......................................................................................................... 28 2.7 Search Engines ................................................................................................................................. 29 2.7.1 Crawlers .............................................................................................................................. 30 2.7.2 Indexing ............................................................................................................................... 35 2.7.3 Query engine Component ................................................................................................... 37 v 2.8 Spoken Document Retrieval ............................................................................................................ 38 2.8.1 English ................................................................................................................................. 38 2.8.2 Amharic ............................................................................................................................... 38 2.9 Comparison of Audio with Text Information Retrieval ................................................................... 39 2.10 Language Identification ................................................................................................................... 40 Chapter Three: Related Work ..................................................................................................................... 44 3.1 Introduction ....................................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    114 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us