Document Retrieval Using Term Hierarchy Approach: English Translated al-Quran

Hayati Abd Rahman Shahrul Azman Mohd Noah Faculty of Computer and Mathematical Sciences, Faculty of Technology and Information Science Universiti Teknologi MARA Universiti Kebangsaan Malaysia Selangor, Malaysia Selangor, Malaysia [email protected] [email protected]

Abstract— The aim of the study is to improve the effectiveness of of data, where also effects the experiment operation and document retrieval that contains sacred words represented in the evaluation. In this paper, statistical formulas will be exploited English translated of the Quran using term hierarchy which are to generate a hierarchy. generated using language model. Term hierarchy is used to suggest meaningful candidate terms for query expansion in Term relationships that make up the hierarchy can be document retrieval. Previous studies concerning domain-specific developed automatically but the text processing for al-Quran is textual documents similar to the Quran are very limited. The not an easy task to be done. Since the document contains kind nature to retrieve this type of documents cannot depend on the of implicit, ambiguity and allegory elements, the method exact word matching only. There are many words that have cannot only depend on the syntax or grammar rules from the multiple interpretations or meaning that can be perceived or text but it also requires semantic approach. Research done by explained by domain expert. Furthermore, studies in query [1], [2], [3] and [4] shows that the text processing using expansions for the Quran translated text are limited and external sources like thesaurus (like Hadith) or pre-defined document retrieval effectiveness highly depends on user feedback categorization seems to be promising for a better document or user interaction. Term hierarchy generation required retrieval. However, the term hierarchy for Quran text using significant features selection. Since the term collection for the unsupervised approach can still be discussed. In addition, [5] Quran is limited, the term selection process is rather challenging. A total of 558 documents have been generated for testing. Two said that the term relationship in hierarchy able to form a types of term relationship are used: mutual and specific; which simple semantic relationship. In line with his statement, this are identified through the term hierarchy. The subsumption and research is about generating an automatic relationship between standard cosine methods are used to generate the hierarchy. The terms in translated Quran corpus. degree of distribution formula has been used for each term to determine significance term indexes representing the content of II. RELATED WORKS the translated Quran. For evaluating the effectiveness of the term hierarchy in document retrieval, query expansion is used. In A. Searching al-Quran using Semantic Approach query expansion, terms which are related with the query terms are used as the candidate term for additional query terms. A By using multi-language approach, [6] have used three total of 39 queries are used during testing. The results from the different language of Malay, English and Arabic for Quran text experiment show that the generated term hierarchy able to in three different corpuses. The query will be translated to propose suitable candidate terms to the original query statement other languages and documents from the corpus will be and consequently improved the retrieval effectiveness. retrieved after considering the respective language through query expansion. An experiment has been carried out using Keywords- term hierarchy, English translated Al-Quran, query expansion in order to evaluate the effectiveness of infromation retrieval document retrieval. Two types of queries namely as non- semantic query and semantic query, have been use for I. INTRODUCTION comparison purposes. Semantic query is constructed manually using dictionary for both languages by considering only Basically, the term 'hierarchy' is a noun that shows the synonym relationship type. Then further searching from the position level of the structure of things. The term structures that multi-language corpus will be followed up. The result shows make up the hierarchy are a systematic method that can be the document retrieval has performed very well and semantic applied to store and access documents effectively. In addition, approach able to improve the Quran document retrieval. hierarchical structure is a practical method for the decomposition of information and for structuring knowledge Reference [7] has used Quranic Search System (QSS) to from a corpus. Therefore, the study is about the effectiveness test three type of queries; stemmed query, text-based query and of automated term hierarchy without term supervised towards synonym-based query. The corpus consists of classic Arabic document retrieval. There are varieties of method to generate language in text that containing each of verses as the document term hierarchy and it is depending on the needs, type and size collection. From the experimental results, the synonym-based query showed a higher value for mean of average precision as compared to the other type of query. Reference [8] also had a Quran text is highly related to linguistic fields. According similar finding where by using keyword based query, the to [13], the term "dunya" and "akhira" (in Arabic language) searching prone to retrieve the irrelevant documents and appears 115 times in the Quran. The number shows the contribute to low range of recall and precision value. importance and correlation between both terms towards the Therefore, semantic searching approaches able to improve the verses in the Quran. In addition, each term in the Qur'an should quality of retrieval regardless the presence of the keyword. be taken into account separately (independent to each other) during term processing. Reference [14] mentioned that each Reference [9] has studied the content of the Quran in detail word should be represented as 0 or 1 in documents where the and described the knowledge representation of the document. term weighting for each word will differentiate the frequency There are seven patterns found in Quran knowledge between word and able to indicate its significance towards representation such as parable, expression (confession), verses documents. which begin with the word 'Qul', abbreviated letters, the generic and specific verses, muqayyad and mutlaq, and clear However, some of the phrase cannot be treated and unclear verses. They have constructed an Islamic independently. This is because some of the terms are knowledge ontology framework using extraction method by ambiguous. For example, the excerpt from Surah Al-Imran [10]. According to them, ontology is an effective method for [3:6]: “He it is Who shapes you in the wombs as He pleases. conceptualization in semantic web. It contains a standard There is no god but He, the Exalted in Might, the Wise”. repository of knowledge and understandable by machines Practically, the word "He" is referred as “God” or “Allah”. (interoperability). Even though, the process is so complex and Hence, the collocation method [3] should be helpful in tedious at the initial stage (because of it requires domain information retrieval. Another example is the word "Qarun" experts to identify, describe, annotate and label all the and "wealth" (from Surah al-Qasas 28:76-82). In literary, concepts), but the extraction process easily can be computed "Qarun" who was known as an arrogant and rich person, had automatically, once the process is completely done. been swallowed because of his ignorance by the earth together with his property. Thus, in this context, the word "Qarun" Religion 2.00 (from cannot be standalone without the word "wealth". In linguistics, http://www.cafeconleche.org/examples/religion/) is a web the terms are categorized as metonymy. application that contains XML documents from four books of the Jewish scriptures (Old Testament), the Bible (New An application named QurSim by [4] contains documents Testament), the book Marmon and the Quran, where the of Quran text. The development of the corpus is using lexical arrangement of the elements are based on a standard syntactic approach where each of the translated word is based on syntax structure of the Document Type Definition (DTD). Reference and morphology from Arabic to English [11] had described the Quran documents using XML semantics (http://corpus.quran.com) and translation by Pickthall in approach, which is the extension of Religion 2.00 with http://www.sacred-texts.com/ISL/pick/index.htm. The QurSim additional of several features to the existing structure of the applications have been used in Quran ontology by [2] where it attribute like SURA_no, AYAT_total_no and AYAT_no that describes the relationship concept by using predicate logic (per describes each element related tags. Generally, the XML word basis in Arabic) and relationship between words and the structure is made up of elements from Quran that contains entity of verse, person and place. The visualization of the numbers of verses, chapters and juzu’. It involved the process ontology is in form of concept graph that contain 300 concepts of tagging and arranging structures based on Specification linked up with 350 relationships. The root node for the Language for XML Semantics (SLXL) rules. SLXS is a ontology is “Concept” where the node for “Allah” is part of method for describing the semantics of functional, declarative “Concept”. Other example is the term “Yaghuth” is part of and local manner. Therefore, the XML structure is claimed to “False Deity” concept, and “False Deity” is a fraction of be more consistent in term of technical aspects of Quran. “Concept”.

B. Quran Morphology Analysis III. METHODOLOGY The original Arabic as the language of the Quran has been There are five main components, namely as categorization simplified with the translation and transliteration process. of documents and queries, pre-processing of documents, term There are many documents that have been translated to selection and term hierarchy generation and document retrieval English, Malay, Indonesian, German, Urdu, French and other before evaluation phase. Figure 1 below illustrates the research language. Furthermore, the development of information framework. searching for Quran text using search engine is not an easy task. For testing and evaluation purposes, [12] found that All documents in the corpus are in English and the document retrieval using multi-language approach is tedious. categorization phase is explained in Dataset Preparation They had conducted an experiment to evaluate on retrieval section. For pre-processing phase, the documents will be effectiveness of Quran text translation in Malay language using processed using stemming (using Porter stemmer) and stop Roman and Jawi scripts on Indri search engine word removal. As a result of data processing, the total words in (www.lemurproject.org/indri/). For indexing purposes, the corpus are 151,593 with 101,947 are stop words. After programming an automated transliteration for both scripts is considering the repetition of words, the number of words in the very difficult without help from domain experts. corpus is 45,302 including 2,220 nouns. by [17] has been used to implement document retrieval. Query expansion has been utilized together with generated term hierarchy. The term hierarchy is used as a ‘thesaurus’ where each query is provided with suggested list of terms from hierarchy as a candidate term for query. English translated (al-Quran) A list of documents retrieved from the system will be compared to relevant document set. The queries do affect the document retrieval performance and according to [18] document relevancy which is determined by human is still Query Set Document Relevant relevant. For evaluation phase, standard measurements like Corpus Document Set recall, precision, mean average precision and R-precision will be used to evaluate the effectiveness of document retrieval.

IV. DATASETS PREPARATION The Quran document is an example that contains literary text. Most of the text is unstructured, formless and difficult to be Document Pre- interpreted directly. Normally, the context from each Processing document can be determined by the content of the documents which contained any important terms in its vocabulary. Currently, noun has been exploited as the data sample and testing has been done based on natural language processing Term Selection from unsupervised English translated Quran corpus. Translated documents version has been taken from text translation of the Quran from the CD-ROM Al-Bayan by Query Term Hierarchy Yusuf Ali. There are three main activities involved which are Expansion the construction of query set, document set in corpus and relevant document set.

Document A. Collection of Documents Retrieval Generally, stemming and stop word removing in text processing is a common exercise and there are variety of techniques that can be adapted. However, the challenge part is Evaluation that to determine a set of document in corpus from a group of verses. The Quran contains 30 parts and 114 chapters and Figure 1. The research framework 6,236 verses where each chapter and surah consist of a few verses. The structure of any Quran translation document is The next phase is term selection where all words in corpus similar to the original Quran. The most important element in will be filtered and selected using mathematical formula. With the Quran is the verse. A verse can be represented as a the use of certain value threshold, only a few numbers of document sample and a sequence of verses can be represented words from corpus will be considered as term index for term as a document as well. For instance, the returns from any topic hierarchy. In other words, the selected terms will be of search, is in form of verse or a sequence of verses from a documents representative that shows the importance of term. number of surah. Since the searching is based on topic, each symbol) will be considered as a ع The unselected terms does not mean the term is not related to ruku’ or section (with the documents totally, but it just not to be parts in the document in corpus. Each section contains a number of verses hierarchy generation. Moreover, it could reduce the capacity which discuss within a same topic and each section is on storage. However, the number of term index does influence independent to each other. Figure 2 shows a snippet of a the structure of hierarchy. With more terms to be considered, document that consists of the chapter number, verse number the larger hierarchy it will be. In this research, Transition and description of the Quran. Point technique by [15] has been chosen for term selection method because of its reliability on literary document where 4:7> For men is a share of what the parents and the near an experiment has been conducted earlier in [16]. relatives leave, and for women a share of what the parents and The next phase is term hierachy generation. The the near relatives leave, whether it be little or much – an relationship between terms is constructed based on horizontal appointed share. and vertical approach using statistical formulation. The : vertical is a type of hierarchy structure that shows the 14:15> And they sought judgment, and every insolent opposer spesificity while the horizontal shows the similarity/mutuality was disappointed. : between terms. The effectiveness of term hierarchy is tested in Verse number Quran translation document retrieval phase. Search engine system called Terrier Chapter number the ancestor of Y, then the specificity of Y is greater than X. Mutual relationship is defines as the inclusion of the term that Figure 2. Components in document formed to be together with other nodes where the term nodes have the same interest with documents that contain the term. B. Query Set For example, if a document which have the index term of X and have the same interest with other documents that have a The development of query set has considered a level of list of other words such as Y and Z, then it is said that X has difficulty factor based on topics (discussed in [19] and [20]). mutual relationship with Y and Z. The query set contains 39 queries which recovered topics about the creation of man, marriages, corruption, sustenance, livelihood, astronomy, respect for parents and reconciliation in A. Mutual Relationship the war. According to [21], the ideal number of number of Let consider the formulas below. After tokenization of queries for the experiment is within 25 to 50 documents which words, the equation (1) is used identify the relationship value sufficient enough to provide the level of confidence for the between two words and based on the calculation, only the experiment results in information retrieval. Table I shows a list prominent will be considered as term index. By using cosines from a part of queries collection. calculation, each word is represented in vectors and it contained the frequency of words for n-document.

TABLE I. A PART FROM THE QUERY SET n V V  X kYk Query Query Description x y k1 (1) cos(Vx ,Vy )   n n 1. God is Most Knowledgeable; God knows matter in |Vx |  |Vy | 2 2 ; the sky and earth.  X k  Yk k1 k1 2. Provisions are determined by God.

where Vx and Vy is a pair of term vector while X and Y is 3. Information about creation of Jinn. k k an element for each of term vector. As recommended by [23], the threshold value (Th4 ) is 0.05. The cosine measure that is 4. Information human creation viz Prophet Adam and Th Eve. greater than 4 will be considered as mutually relevant. 5. Directive not to turn to from Satan. In order to quantify the similarity between two terms, there are three similarity expression need to be calculated, where t1 t V (t ) t 6. Satan refuses to bow down to Prophet Adam. and 2 are terms, n is the term vector of n .

s0  cos(V (t1),V (t2 )) (2)

#(t1  t2 ) (3) s1  C. Relevant Documents Set #(t1) According to [19], the higher number document in relevant #(t1  t2 ) (4) document set, the better implication on precision of query. By s2  using a domain expert, a verification has been done in order to #(t2 ) determine the relevant documents for each query and 416 documents has been identified. Each of verse from every By considering the conditions below, the mutual relationship chapter has been examined for each query manually. The between two terms can be determined. If the condition of (5) or validation from the expert was based on semantic approach so (6) is comply, meaning that both terms are mutually relevant. that the assessment during document retrieval becoming more s  Th and s  Th (5) perfect. 1 2 0 4

s2  Th3 and s0  Th4 (6) V. TERM HIERARCHY However, if the condition (7) or (8) is fulfilled, meaning that Term hierarchy is an approach to form a link among words both of the terms are irrelevant. The threshold for Th and Th in corpus. Two methods have been used to generate the 2 3 hierarchy using subsumption approach and standard cosine are 0.4 and 0.8, relatively. measurement. These methods are using statistical formulation s  Th and s  Th (7) and they have been used to determine the specificity and 1 2 0 4 mutual relationship between terms, relatively. Specificity is s2  Th3 and s0  Th4 (8) defined as the quantity of words that contain the words from domain-specific information [22]. Basically, the term node with high specificity value is assumed to have a large scale of information as compared to general node. For example, if X is B. Spesificity Relationship collection and parameter l can be ranged between [0,1]. In The language model approach in information retrieval was conjunction with term language model, the correlation introduced by [24] has proposed a scoring method that based between pair of terms is simplified as below; on probability. A language model is a probabilistic mechanism  iD p(Wx | D) p(Wy | D) (11) to define a distribution over all the possible words sequences. p(Wx ,Wy )  Due to the scoring method used to estimate a language model n ;  p(wn | D) for each document, sometime it also known as statistical iD, j1 language model. The hierarchy that worked from general to specific (or could Formally, the idea is that the document is ranked based on be vice versa) and the distribution of one term must be greater likelihood of generating the query. However, this paper is than the other one in the context of association between pair of concentrating on the distribution of word in text vocabulary. terms (Wx and Wy). The subsumption ratio of Wx and Wy can The estimation used is more to text and it can be called as text be defined as; language model. The text language modeling can be referred as a probabilistic model of text where it defined a probability p(Wx |Wy )  p(Wy |Wx ) (12) of distribution over sequence of words. A document, D is interpreted as a vector the representing the presence and VI. EXPERIMENT AND RESULT absence of each word. The multiple Bernoulli distribution The relevant document is determined by human and this model has been used to model the presence and absence of a approach is still valid for study even though it has been used word. for more than 20 years. The Quran are used as a reference Basically, vocabulary, V = {w1,…,w|v|} and the presence document for any questions or queries. Practically, keywords and absence of each word wi is defined as Xi=1 and Xi=0 will be used for searching any verses in Quran. For example, respectively. Based on likelihood estimation, the presence of keywords like 'maroot' is contained in the verse 2:201 while each word, wi in D is relatively represented as; the 'Day of Resurrection' has a semantic relationship with verse 1:4 of 'The Owner of Judgment Day'. Term hierarchical approach is used in the accessed tf (wi , D) (9) p(X i  1| D)  ; document without using other additional thesaurus. With the | D | aid of query expansion methods, query candidate is automatically chosen from the hierarchy. The relationship that where tf(wi,D) is the number of word wi in D while |D| is the is formed from hierarchical term is relevant to be considered number of words in D. adding some semantic value to the original query. Terrier In order to avoid problem caused by words that does not search engine system [17] is used as a standard document appear in a document, a smoothing method is applied. The access. Recall and precision are the main measurement tools to usage of this method is to estimate the non-zero values for be used to evaluate the effectiveness of document retrieval, each word. There are various kinds of smoothing methods and besides precision-R and Mean Average Precision (MAP). the most popular one are Jelinek-Mercer method, Dirichlet Table II describes the increment of the precision value method and Absolute Discounting method. The idea of using query expansion for mutual and specific relation type as smoothing method is to define the ability to compute the compared to the original query. Ideally, recall measurement is probability of a term given which the present in a document so used to evaluate the deliverable of search engines towards that it able to reduce probabilities of terms exist in the retrieving all the relevant documents while precision measures document and to boost the probabilities of terms that are the performance of the relevant documents that are not absent. removed in the retrieved document list [18]. Table II shows the For this paper, experiment has been done using Jelinek- 11-point interpolated standard precision-recall for query Mercer smoothing method. There is neither specific reason nor expansion method with the original query (for benchmarking comparison experiment done to answer of why the Jelinek- purpose) on documents retrieval. Thus, the results of the Mercer method has been chosen. As matter of fact, the method analysis of 39 queries found that the average precision for the uses a linear interpolation of the maximum likelihood model original query was 10% as compared to query expansion which increased up to 18.6% and 15.8% respectively. The MAP value and the background model. The parameter l is used to control for the original query, specificity and mutual relationship is the influence of each model. Below is the language model that 0.091, 0.136 and 0.166 respectively. The result shows that the is smoothed by interpolating the likelihood estimation with a term hierarchy approach able to improve the retrieval result background language model (estimated from the entire and the consistency of increment of precision as compared to collection); the mean from the 11-point precision-recall. tf (w, D) (10) p(w | D)  (1 )  p(w | C) ; | D | TABLE II. THE INCREMENT OF PRECISION VALUE USING QUERY EXPANSION (TERM HIERARCHICAL APPROACH) FROM ORIGINAL QUERIES FOR DOCUMENT where p(w|C) is a background (collection) language model RETRIEVAL that estimate based on number of word appear in the entire Recall Precision Increment Original Query Expansion Mutual Specificity Mutual Specificity Query R/ship R/ship R/ship R/ship 0 0.186 0.371 0.350 0.992 0.879 0.1 0.186 0.336 0.306 0.810 0.651 0.2 0.167 0.285 0.255 0.703 0.525 0.3 0.124 0.226 0.197 0.829 0.594 0.4 0.104 0.180 0.130 0.721 0.246 0.5 0.102 0.152 0.122 0.491 0.191 0.6 0.083 0.117 0.081 0.414 -0.025 0.7 0.054 0.106 0.074 0.971 0.388 0.8 0.052 0.103 0.074 0.990 0.436 0.9 0.021 0.093 0.074 3.550 2.625 1 0.021 0.076 0.068 2.713 2.313 MEAN 0.100 0.186 0.158

Every query has been measured using R-precision. Those figures show the performance for each query using term hierarchy as compared to the original query. Fig. 3 shows an Figure 4. Comparison between all queries for R-Precision using specificity increment of R-precision value for mutual and specificity relationship approach relationship for each query. There are 14 out of 39 show a positive result of the query 1, 3, 7, 9, 10, 11, 16, 18, 19, 20, 22, The mean of the R-precision value from both type of 23, 28 and 38, while three queries (query 2, 4 and 21) show relationships show some improvement as compared to the negative value. The other queries have remained unchanged. benchmark. As discussed earlier, the nature of the Quran Fig.4 also shows some improvement on query expansion document and the querying style are factors that limited the method on specificity relationship approach. There are 11 out effectiveness of retrieval. Not all words from Quran text can be of 39 show increments (for query 1, 3, 7, 9, 11, 12, 18, 19, 23, interpreted directly using interpreter unless we engaged with a 26 and 28), three queries (query 2, 8 and 21) show decrement domain expert. An experiment done by [20] showed good of R-precision and the others remain no changes. result for R-Precision value when using categorization method. Furthermore, the term hierarchy approach also seems to be a promising method that can be considered in future in automated information retrieval.

VII. CONCLUSION The main goal of this research is to evaluate the knowledge representation from domain specific documents (al-Quran) in information retrieval. Without using any external sources like WordNet or any thesaurus, term hierarchy approach has been proposed in order to improve the effectiveness of document retrieval. An empirical experiment has been conducted and the finding shows an encouraging result. However, there are some internal constraints need to be solved as for the enhancement in future research. This paper has recognized two major constraints that distract the term selection process and limit the better achievement during the experiment. The first constraint is the use of symbols like ‘-’ and ‘`’ in the corpus. There is no consistency for any translated text from al-Quran to have a Figure 3. Comparison between all queries for R-Precision using mutual standard name or term to be used in translated al-Quran, for relationship approach example like ‘ad, blood-relationship, marriage-relationship, ninety-nine, setting-place and so on. The second constraint is there are variety of spelling for noun words and it depends on the language of translation, for instance, Satan/Shaitan, Korah, Makkah/Mecca, Eve, Moses dan Yusouf/Yusuf. By condering the constraints, further research is needed in order to improve the effectiveness of information retrieval. With the ongoing research effort, information from al-Quran text can be expanded in more meaningful way besides Hadis as a main reference document.

ACKNOWLEDGMENT This research paper is made possible through the support [12] O. Roslina, and A. W. Fauziah, “Issues in evaluating the retrieval from Universiti Teknologi MARA. We also would like to performance of multiscript translation of Al-Quran,” 6th World Congress of Muslim Librarians and Information Scientists 2011, 2011. thank everyone who helped make this research done. [13] M. A. Haleem, Understanding the Qur’an: Themes and Styles. New York: I. B. Tauris & Co Ltd, 1999. REFERENCES [14] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing and Management, pp. 513–523, 1988. [1] A. B. Zainab and A. R. Nurazzah, “Evaluating the effectiveness of [15] F. R. Lopez, H. Jimenez-Salazar and D. Pinto, “A competitive term thesaurus and stemming methods in retrieving Malay translated al-Quran selection method for information retrieval,” Lecture Notes on Computer documents,” Lecture Notes in Computer Science, vol. 2911, pp. 653- Science, vol. 4394, pp. 468-475, 2007. 662, 2003. [16] A. R. Hayati and M. N. Shahrul Azman, “A comparative analysis of the [2] K. Dukes and N. Habash, 2010, “Morphological annotation of Quranic entropy and transition point approach in representing index terms of Arabic,” Proceedings of the Seventh International Conference on literary text,” Journal of Computer Science, vol. 7(7), pp. 1088-1093, Language Resources and Evaluation (LREC’10), May 2010. 2011. [3] S. Ebrahimi, M. R. Pahlavannezhad and G. Nadernezhad, “The analysis [17] I. Ounis, G. Amati, V. Plaschouras, B. He, C. Macdonald and C. Lioma, of Quranic collocations in the orchard Boostan of Sa’di”, International “Terrier: A high performance and scalable information retrieval Journal of Linguistics, vol. 4(3), pp. 281–292, 2012. platform,” In Proceedings of ACM SIGIR'06 Workshop on Open Source [4] A. B. Sharaf, and E. S. Atwell, “QurSim: A corpus for evaluation of Information Retrieval, 2006. relatedness in short texts,” 8th International Conference on Language [18] W. B. Croft, M. Bendersky, L. Hang and X. Gu, “Query representation Resources and Evaluation (LREC), pp. 2295–2302, 2012. and understanding workshop,” SIGIR Forum, vol. 44(2), pp. 48-53, [5] H. Joho, C. Coverson, M. Sanderson and M. Beaulieu, “Hierarchical 2010. presentation of expansion terms,” Proceedings of the 2002 ACM [19] T. Sakai, T. Kitani and Y. Ogawa, “BMIR-J2: A test collection for Symposium on Applied Computing, 2002. evaluation of Japanese information retrieval systems,” ACM SIGIR [6] M. Y. Mohd Amin, Z. Roziati and A. Noorhidawati, “Semantic method Forum, vol.33(1), pp. 13-17, 1999. for query translation,” The International Arab Journal of Information [20] H. Mohd Pouzi, “Phrase and semantic relationships in knowledge Technology (IAJIT First Online Publication), vol. 10(3), 2013. representation: the effect on retrieval effectiveness of Malay [7] A. T. Al-Taani and A. M. Al-Gharaibeh, “Searching concepts and documents,” “Frasa dan hubungan semantik dalam perwakilan keywords in the Holy Quran,” The International Arab Conference on pengetahuan: kesan terhadap keberkesanan capaian dokumen Melayu”, Information Technology (ACIT’2011), 2011. Ph.D Thesis. Universiti Kebangsaan Malaysia, 2006. [8] M. Shoib, M.. Nadeem Yasin, U. K. Hikmat, M. I. Saeed and M. S. H. [21] C. Buckley and E. M. Voorhees, “Evaluating evaluation measure Khiyal, “Relational WordNet model for semantic search in Holy stability,” Proceedings of the 23rd Annual International ACM-SIGIR Quran,” International Conference on Merging Technologies (ICET Conference on Research and Development in Information Retrieval, pp. 2009), pp. 29– 34, 2009. 33-40, 2000. [9] S. Saidah, S. Naomie, Z. Hakim and N. Shahrul Azman, “A Framework [22] P. M. Ryu, and K. S. Choi, “Determining the specificity of terms using for Islamic knowledge via ontology representation,” International inside-outside information: a necessary condition of term hierarchy Conference on Information Retrieval & Knowledge Management mining,” Internet Processing Letters, vol. 100(2), pp. 76-82, 2006. (CAMP 2010), pp. 310–314, 2010. [23] L. Chen, M. L’Abbate, U. Thiel and E. J. Neuhold, “The layer-seeds [10] S. Saidah, S. Naomie and O. Nazlia, ”Keyphrase extraction for Islamic term clustering method: enabling proactive situation-aware product knowledge ontology,” International Symposium on Information recommendations in e-commerce dialogues,” Journal Information Technology (ITSim2008), pp. 1-6, 2008. Systems Frontiers, vol. 7, pp. 405-419 ,2005. [11] Y. Kotb, K. Gondow, and T. Katayama, “A case study for XML [24] J. M. Ponte and W. B. Croft, “A language modeling approach to semantics checker model,” IEEE International Conference on Systems, information retrieval,” In Research and Development in Information Man and Cybernetics, vol. 5, pp. 4834–4839, 2003. Retrieval, pp. 275–281, 1998.