Automatic Learning Assistant in Telugu

Meghana Bommadi, Shreya Terupally, Radhika Mamidi Language Technologies Research Centre International Institute of Information Technology , [email protected] , [email protected], [email protected]

Abstract interactive NLP applications such as AI-based di- alogue systems/voice assistants like SIRI, Google This paper presents a learning assistant that Assistant, Alexa, and similar personal assistants. tests one’s knowledge and gives feedback that Research is still going on to make these assistants helps a person learn at a faster pace. A learn- ing assistant (based on an automated question work in major Indian languages as well. generation) has extensive uses in education, An automated learning assistant like our system information websites, self-assessment, FAQs, is not only useful for the learning process for testing ML agents, research, etc. Multiple re- humans but also for machines in the process of searchers, and companies have worked on Vir- testing ML systems2. Research has been done for tual Assistance, but majorly in English. We Question Answer generating system in English3, built our learning assistant for to help with teaching in the mother tongue, concentrating on basic Wh-questions with a 4 which is the most efficient way of learning1. rule-based approach , question template based 5 Our system is built primarily based on Ques- approaches etc. For a low-resourced language tion Generation in Telugu. like Telugu, a complete AI-based solution can Many experiments were conducted on Ques- be non-viable. There are hardly any datasets tion Generation in English in multiple ways. available for the system to produce significant We have built the first hybrid machine learn- accuracy. A completely rule-based system might ing and rule-based solution in Telugu, which leave out principle parts of the abstract. There is proves efficient for short stories or short pas- a chance that all the questions cannot be captured sages in children’s books. Our work cov- inclusively by completely handwritten rules. ers the fundamental question forms with ques- Hence, we want to introduce a mixed rule-based tion types: adjective, yes/no, adverb, verb, when, where, whose, quotative, and quantita- and AI-based solution to this problem. tive (how many/how much). We constructed rules for question generation using Part of Our system works on the following three Speech (POS) tags and Universal Dependency crucial steps: (UD) tags along with linguistic information of the surrounding relevant context of the word. 1. Summarization Our system is primarily built on question gen- 2. Question Generation eration in Telugu, and is also capable of evalu- ating the user’s answers to the generated ques- 3. Answer Evaluation tions. We implemented summarization using two 1 Introduction techniques viz. Word Frequency (see 4.1), and TextRank (see 4.2) which are explained further in Research on Virtual Assistants is renowned section 4. Summarization since they being widely used in recent times for We attempted to produce questions, concentrat- numerous tasks. These assistants are generated us- ing on the critical points of a text that are generally ing large datasets and high-end Natural Language Understanding (NLU) and Natural Language Gen- 2(Hidenobu Kunichika, 2004) eration (NLG) tools. NLU and NLG are used in 3(Maria Chinkina, 2017) 4(Payal Khullar) 1(Roshni, 2020)(Nishanthi, 2020) 5(Hafedh Hussein, 2014)

29

Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 29–37 August 5–6, 2021. ©2021 Association for Computational Linguistics asked in assessment tests. Questions posed to an 1. Number of stories : 21 individual challenge their knowledge and under- 2. Average number of sentences : 56 standing of specific topics, so we formed questions in each sentence in as many ways as possible. We 3. Average number of words : 281 based this model on children’s stories, so the ques- 4. Genre of the stories : Moral Stories for Chil- tions we wanted to produce aim to be simpler and dren more objective. For testing we used stories by Prof. N. Lakshmi Based on the observation of the data chosen and Aiyar: analysis of all the possible causes, we developed a set of rules for each part of speech that can 1. Number of stories : 5 be formed into a question word in Telugu. We 2. Average number of sentences : 190 maximized the possible number of questions in 3. Average number of words : 1060 each sentence with all the keywords. We built rules for question generation based on POS 4. Genre of the stories : Realistic Fiction tags, UD tags and information surrounding the word, which is comparable with Vibhaktis (case 4 Summarization markers) in Telugu grammar. Since Telugu is a low resource language, we used statistical and unsupervised methods for this The Question Generation in manually evaluated task. Summarization also ensures the portability and the detailed error analysis is given in section of our system to other similar low resource lan- 8.1. Our Learning Assistant evaluates using string guages. matching, keyword matching for Telugu answers, For summarization, we did a basic data prepro- and a pre-trained sentence transformer model us- cessing (spaces, special characters, etc.) in addi- ing XLM-R.(Nils Reimers, 2019) tion to root-word extraction using Shiva Reddy’s POS tagger7. 2 Related Work We used two types of existing summarization techniques: Previously, Holy Lovenia, Felix Limanta et al.[2018] (Holy Lovenia, 2018) experimented on 1. Word Frequency-based summarization Q&A pair Generation in English where they suc- 2. TextRank based frequency ceeded in forming What, Who, and Where ques- tions. Rami Reddy et al.[2006] (Rami Reddy 4.1 Word Frequency-based Summarization Nandi Reddy, 2006) worked on Dialogue based WFBS (Word Frequency-based Summariza- Question Answering System in Telugu for Rail- tion) is calculated using the word frequency in the way inquiries, which majorly concentrated on passage.8 This process is based on the idea that Answer Generation for a given query. Sim- the keywords or the main words will frequently ilar work has done by (Hoojung Chung) on appear in the text, and those words with lower dealing with practical question answering sys- frequency have a high probability of being less tem in restricted domain. Shudipta Sharma et related to the story. al.[2018](Shudipta Sharma) worked on automatic Q&A pair generation for English and Bengali texts All the sentences that carry crucial information using NLP tasks like verb decomposition, subject are produced successfully by this method because auxiliary inversion for a question tag. the keywords are used repeatedly in children’s stories, subsequently causing the highest fre- 3 Dataset quency. We have used a Telugu stories dataset taken We used a dynamic ratio (a ratio that can be from a website called kathalu wordpress".6 This changed or chosen by the user as an input) for get- dataset was chosen because of a variety in the ting the desirable amount of summary (short sum- themes of the stories, wide vocabulary and sen- mary or a longer summary, for example: k% of tences of varying lengths. 7http://sivareddy.in/downloads 6https://kathalu.wordpress.com/ 8(Ani Nenkova)(Mr. Shubham Bhosale)

30 the sentences, the system will output k% of sen- instead of high-frequency words. tences with the highest frequent words from the dictionary) This ratio, when dynamically changed, We used two kinds of similarity measures performed better than the fixed ratio of word selec- for the TextRank based summarization: tion. 1. Common words: A measure of similarity Steps followed in WFBS are: based on the number of common words in 1. Sentences are extracted from the input file. two sentences after removing stop words. We 2. The file is prepossessed and the words are to- used root word extraction of the common kenized. words for better results since Telugu is a 3. Stop words are removed. fusional and agglutinative language and has 4. Frequency of each word is calculated and repeated words with a different suffix each stored in dictionaries. time. 5. The sentences with least frequent word are 2. Best Match 25: A measure of the similar- removed. ity between two passages, based on term fre- 6. Calculated the ratio of words that occur in quencies in the passage.10 highest to lowest frequency order. The results observed by this method captures 4.2 TextRank based Frequency crucial information of the story, but lesser read- ability and fluency was observed. Within the sim- TextRank is a graph-based ranking model9 that ilarity measures, BM25 has shown slightly better prioritizes each element based on the values in the results since the BM25 algorithm ranks sentences graph. This process is done in the following steps: based on the importance of particular words (in- 1. A graph is constructed using each sentence as verse document frequency - IDF) instead of just a node using the frequency of words.

2. Similarity between the two nodes is marked 5 Answer Phrase Selection as the edge weight between the nodes Candidate answers are words/phrases that de- 3. Each sentence is ranked based on the similar- pict some vital information in a sentence. Adjec- ity with the whole text tives, adverbs, and the subject of a sentence are some examples of such candidates. 4. The page-rank algorithm is run until conver- The answer selection module utilizes two main gence NLP components - POS Tagging (Part of Speech tagging) and UD parsing (Universal Dependency 5. The sentences with top N ranking as summa- parsing), along with language-specific rules to de- rized text is given as the output termine the answer words in an input sentence.

The TextRank algorithm is a graph based method 5.1 POS Tagging that updates the sentence score WS iteratively We followed state-of-the-art method by Siva using the following equation (1). Reddy et al. (2011) (Siva Reddy, 2011), Cross- Language POS Taggers" an implementation of a 11 wi j TnT-based Telugu POS Tagger to parse our WS(Vi) = (1 − d) + d ∗ ∑ WS(Vj) ∑ w jk data. ViεIn(Vi) VkεOut(Vj) (1) The tagger learns morphological analysis and POS tags at the same time, and outputs the lemma (root word), POS tag, suffix, gender, number and Where d = damping factor (0.85), w is the ij case marker for each word. similarity measure between ith and jth sentences. The model was pre-trained on a Telugu corpus containing approximately 3.5 million tokens and This method has the advantage of using the similarity between the two sentences to rank them 10(Federico Barrios, 2016) 11https://bitbucket.org/sivareddyg/ 9(Joshi, 2018)(Liang, 2019) telugu-part-of-speech-tagger/src/master/

31 had an evaluation accuracy of 90.73% for the main 4. Direct and Reported Speech: The word POS tag. is generally used to denote direct speech "అꀿ" in Telugu. Phrases before the word , "అꀿ" 5.2 UD Tagging along with phrases in quotation marks, are A Bi-LSTM model using Keras is struc- chosen as answer phrases. tured and trained using Telugu UD tags dataset 12 5. Verbs: Telugu follows the SOV (Subject Ob- UD_Telugu-MTG". ject Verb) structure, in general. If the last The Bi-LSTM model outputs the UD tags for word has a V" POS tag in a sentence, then each word in a sentence using Keras. We consid- we selected the verb and adjacent adverbs as ered the subject, which is marked subj" by UD an answer candidate. tagger, as the selected answer phrase for a sen- tence based on the condition that it marked root 6. Subject: We use the UD tags to determine and punctuation correctly. the subject of a sentence. As an additional This model gave 85% accurate results, includ- check, we only select the candidate subjects ing the PAD tags(padding tags), which might not in those sentences whose last word is tagged be an adequate result, but based on the conditions as the root verb, and the subject is a noun. and given that the tags subj" is labeled in a sen- tence scarcely, the results have been considered to 6 Question Formation be acceptable. Questions are formed according to the chosen phrases chosen previously, and the question words 5.3 Rules are replaced using further conditions if required. The outputs of the POS Tagging and UD Pars- 1. Quantifiers, Adjectives, Adverbs: The ing modules are used as the crucial markers in our words that are marked JJ POS are replaced language-specific rules. In addition to conditions with (eTuvanti"- what kind of) based on word surroundings, these tags select one "ఎ籁వం簿" RB POS tagged that are followed by verbs or more answer phrases in each sentence. with (gA") suffix are replaced by We classify the rules into different categories, "尾" "ఎ젾" (elA"-how) and the QC tagged words that typically based on their usage and interrogative are not articles ( (oka"- one/once) were forms. "ఒక" chosen and changed based on the following word. If the quantifier is followed by 1. Quantifiers, Adjectives, Adverbs: Words "�తం", (shAtam",maMdi",varaku") with the QC, RB, and JJ POS tag, respec- "మం頿" ,"వర呁" then the word is replaced with (eMta"- tively. For words with JJ tags, the word and "ఎంత" how much), if the quantifier has a suffix it is the corresponding determiners (if present) are added to the question word. selected as the answer candidate. For example: (eM- 2. Possession based: Words with PRP and "1700呁" - "ఎంత呁" NN tags that have suffixes as taku) and the rest of the quantifiers like "簿"," 뱊క呍", and (Ti",yokka",ki" and ku"). The (meaning five sparrows) are "吿" "呁" ఐ顁 ꠿桁桍క졁 suffix (Ti") is used for words like replaced with (enni"-how many) ( "簿" "ఎꀿꁍ" "ఎꀿꁍ , , , (atani"-his, (how many sparrows) in this case). "అతꀿ" "퐾ళ챍" "కం簿" "퐿頾뱍쁁鱍 ల" ꠿桁桍క졁" vAlla"-their’s, kanTi"-eyes’, vidyArthula"- 2. Possession based: The nouns and pronouns students’) that satisfied the rules are replaced with (evari"-whose ) and the dative cases "ఎవ쀿" 3. Time-Place based : Noun words with a are replaced with (evariki"-to whom). (lO") suffix, along with other words "ఎవ쀿吿" "졊" This could be an exception for non-human present in custom list of time-related words nouns and pronouns. In the children’s stories, ( )(morning", year") come "렾쀿ꁍం屍", "ఇయ쁍" most of the nouns are personified, so there under this category. were fewer errors than we presumed. 12https://github.com/UniversalDependencies/ UD_Telugu-MTG (Bogdani, 2018) For example: A sentence with a phrase

32 like (ram’s house...) would tags, and the root suffix is changed accord- "쀾롁萿 ఇ졁졍 ..." form a question like (whose ingly for (evaru"-who (honorific)). "ఎవ쀿 ఇ졁졍 .." "ఎవ쁁" house..) For example: "గంగ అక呍萿 ꁁం栿 푆찿졍 ꡋ밿ం頿." 3. Time-Place based: We made a list of words (Ganga left from that place) forms a question that are used to convey time. If the lemma of like (Who "ఎవ쁁 అక呍萿 ꁁం栿 푆찿졍 ꡋ밾쁁?" the word matched the word in the dictionary, left from there?). then we marked it time" and was replaced with (eppuDu"-when) or else it was 7 Answer Evaluation "ఎపుప్葁" marked as a place and replaced with "ఎక呍డ" User’s answer for the question generated is eval- (ekkaDa"-where). uated in two ways depending on the form of input. For example: A sentence with the phrase 1. Telugu Answer Evaluation (he will come tomorrow) will "쁇ꡁ వ遍葁" 2. Multilingual Answer Evaluation form a question (when will "ఎపుప్葁遍 వ葁?" he come). 7.1 Telugu Answer Evaluation A string input in Telugu is taken from the user 4. Direct and Reported Speech : The whole and string matching is done for the whole sentence speech phrase or the phrase that is quoted to the answer phrase stored from Question and An- is replaced with (Emani") in the "ఏమꀿ" swer Pair Generation. Answer could be either in sentence. the sentence form or in a phrasal form that has the keywords which the question was formed on. For example: A phrase in quotes in a sentence like 7.2 Multilingual Answer Evaluation 顁쁋뱍ధꁁ葁 "ఏమం簿퐿 ఏమం簿퐿..!" (Duryodhan said,"what did అꀿ అꀾꁍ葁. 7.2.1 Sentence Transformers you say..!".) would form a question like (what did Duryo- Similar to word embedding, where the learned 顁쁋뱍ధꁁ葁 ఏమꀿ అꀾꁍ葁? dhan say?) representation of same words have similar rep- resentation, sentence embedding (Nikhil, 2017) 5. Verbs : The verb is replaced with maps semantic information of sentences into vec- "ఏ렿 桇遍" Emi cEstu"-doing what) + ". The ap- tors. Multilingual Sentence Embedding deals with propriate suffix is chosen from the informa- sentences in multiple languages that are mapped tion lost in the lemmatized word. in a closer vector space if they have similar mean- ings. Additionally, the verb tags were used to form Sentence Transformers are Multilingual Sen- polar questions. The interrogative form of tence Embedding (Ivana Kvapilíková, 2020; a sentence in Telugu can be constructed by Mikel Artetxe, 2019) formed using BERT / adding intonation to the verb, so we added RoBERTa / XLM-RoBERTa & Co. with Py- (A") vowel at the end of the verb to make 13 "ఆ" Torch . This framework provides an easy way a yes or no question. The answer phrase to of computing dense vector representation of sen- this question would be (avunu"-yes), "అ푁ꁁ" tences in multiple languages. They are called followed by the original phrase. sentence transformers since the models are based on transformer networks like BERT / RoBERTa / For example: A sentence with a verbal XLM-RoBERTa etc. phrase like (Sita is going) "త 푆챁遂 ఉం頿" We use a pre-trained sentence transformer will form a question like "త ఏ렿 桇遍 ఉం頿? (Nils Reimers, 2019) based cross-lingual sentence (What is Sita doing?). " embedding system which can take a sentence in a language and create an embedding in a multi- 6. Subject : Based on the suffix of the verb the lingual space. The answer phrases and sentences subject is replaced with or "ఏ頿", "ఏ퐿" "顇ꀿ", are stored in a dictionary. The answers in a dif- (meaning what, which simultane- "푇簿吿" ferent language are taken as an input and are pro- ously) or (evaru"-who) if the subject "ఎవ쁁" has a gender and marked a human in POS 13(Reimers, 2021)(Horev, 2018)(Ferreira, 2020)

33 jected into multilingual space and the similarity is Question Occurrences Errors checked using cosine similarity with the stored an- word swer phrase in Telugu. (elA) 64 2 ఎ젾 In the final system we used syntax matching to (enni) 76 5 ఎꀿꁍ mark the user’s answer if the input is in Telugu (eMtaku) 4 0 ఎంత呁 and used sentence transformers if the input is in (eMta) 3 0 ఎంత any other lanuage. (evari) 187 0 ఎవ쀿 (evariki) 1 0 ఎవ쀿吿 8 Results (Emi) 69 3 ఏ렿 (dEni) 45 10 We obtained results that resemble commonly 顇ꀿ (evaru) 20 0 used questions covering nine POS and UD tags. ఎవ쁁 (eppuDu) 7 0 The questions generated by this system are suc- ఎపుప్葁 (ekkaDa) 21 5 cessful and are most similar to academic questions ఎక呍డ 148 2 we see in textbooks. We did manual error analy- ఏ렿 桇遍 sis for the question and answer pair generated. In (Emi cEstU) (Emani) 10 0 most cases, it has produced legible results that re- ఏమꀿ (A) 148 0 semble human-made questions, but there were er- ఆ rors in a few complex sentences. Out of the 916 103 6 (eTuvaMTi)ఎ籁వం簿 questions formed, only 34 were either completely (vETiki) 10 1 erroneous or illegible. The rest were both gram- 푇簿吿 matically correct and significant for the context of Table 1: Question Types the story. The system successfully obtained all possible questions for each simple sentence, not requiring further linguistic analysis. enni" (quantifier - based) questions are built Table 1 lists the number of times each question from diverse quantifiers (for example: time, age, word occurred and the number of times it appeared number of people - these quantifiers are often wrong in the experiment with five stories. Table written as sandhi with the word, which causes the 2 in section 9 shows the sample question and an- POS tagger to give ambiguous tags) and numerous swers generated by the system for children stories. ways of writing quantifiers in Telugu. Few quan- 8.1 Question Generation Error Analysis tifier question word errors occurred due to wrong POS tagging of cross-coded words (words that are The Question Generation by the system is man- actually in English but written in Telugu script). ually annotated by two human evaluators with In Telugu, two numbers are used together when Computational Linguistics background. Guide- representing non-specific quantities between lines given to the evaluators are: the two numbers (x y means from x to y), for • Question with grammatical mistakes are example, reMDu (two) mUDu(three) nimishAlu marked as errors. (minutes)" meaning two to three minutes. This kind of representation makes the system assume • Semantic errors in question are marked as er- there are two quantifiers, and the sentence is rors. eligible for two questions based on the same. • Questions that are highly irrelevant to the story are marked as errors. dEni" (subject-based) questions have errors because of ambiguous suffixes and inaccuracies in Errors are equally influenced by the word tags, UD tagging. The lack of human identification in the context of the word, and the word’s position in the system made human subjects also replaceable a sentence. We analysed each and every way the with dEnini" instead of evarini". Another error errors occurred and could occur. was due to subjects that were nominal (names) Errors in elA" (’how’) questions are often with end syllables similar to common suffixes caused due to spaces between the words and (which are included as word context in the rule suffixes in the dataset we chose. formation). These names were split and formed incorrect question words. For example, the name

34 Shalini" was converted to interrogative form as For example: dEnini". The rest of the errors are due to wrong POS tags, cross-codes, and initials/abbreviations. Q: ఎవ쀿 చ顁వం逾 籀졋 , ద쀾灍 尾 ... 尿ం頿? Q: Whose studies got completed in the city luxuriously? Emi" (’what’) question forms also have similar A: ꁀ చ顁వం逾 籀졋 , ద쀾灍 尾 ... 尿ం頿 . POS tags and cross-codes issues. Few of these A: Your studies got completed in the city errors occurred due to punctuation marks between luxuriously. the same sentence, breaking it up into multiple sentences. In this case the question is aptly formed but the answer is slightly ill-formed. eTuvaMTi" (’what-kind-of’) question forms run into issues where there is personification. There were few errors due to the POS tagger General questions based on adjectives for humans we used. It marked wrong POS tags for cross are based on a person’s subtle qualities; however, coded text. in a few cases, the adjective that was chosen is inapt to be formed into a question (less similar For example: to human made question). The question that was formed was still grammatically correct in both Q: ꁀలం 呁렾వ遍, ఎꀿꁍ? human and non-human subjects; nevertheless, it is Q: Neelam Kumawath, how many? more suitable and precise for a non-human noun. A: For example ( /what kind of Shalini- ꁀలం 呁렾వ遍 , ఐ . ఎ젾ం簿 �젿ꀿ A: Neelam Kumawath, I. / the Shalini, that I know) ప쀿చయమౖెన �젿ꀿ

The error in this question and answer pair is ekkaDa" (’where’) based question forms show the ’I’ which is an initial (Neelam Kumavat, errors when an abstract word is used as a place, "ఐ" I) is marked as a number. for example - In thoughts", In that age". Certain quantitative words in Telugu can be appended 9 Conclusions with -lO to convey meanings like in youth", in hundreds". They tend to pass the rules in question We have built a mixed rule-based and AI- generation. Our list of time-related words is not based question and answer generating system with exhaustive, so a few time-related words are also 96.28% accuracy. tagged under ekkaDa" (place) because of the same We used two methods for summarization suffix. and two similarity measures. We constructed observation-based rules for the dataset in a partic- ular domain. There is a chance of varying results if Most of the tags are error free except for a few we test this system for data in a different domain, ambiguous errors since the rules select answer but it gives accuracies above 95% for any data in phrases precisely or do not consider it. Some of the domain chosen. the examples of the questions that are produced We tested question generation in the news by the system are listed below in Table-2 in the article domain, which gave grammatically correct appendix. The results can be improved to make questions. The error rate may increase if we use the question formation more precise by increasing complex words and phrases that need tags beyond the number of rules by observing further data. the proposed set of rules.

We plan to extend our work to be able to in- The anaphora resolution is a limitation in this clude: system; thus, most of the in-appropriation in the answer section was caused due to this. 1. Anaphora Resolution 2. Extending to other domains

35 3. Cover more types of questions References 4. Improving the UD tagging model Lucy Vanderwende Ani Nenkova. The impact of fre- For testing the meticulousness of the user, as a fu- quency on summarization. ture task, we wish to use: Bogdani. 2018. Parts of speech tagging using lstm (long short term memory) neural networks. 1. Questions on minor details Luis Argerich Rosita Wachenchauzer Federico Barrios, 2. NE (Named Entities) and CN (Common Federico Lt’opez. 2016. Variations of the similarity Nouns) function of textrank for automated summarization.

Diogo Ferreira. 2020. What are sentence embeddings Q: and why are they useful? ఎ籁వం簿 롋ట 運 వంగడం కష籍 ం尾 푁ం頿? A: అంత ꡆద額 롋ట 運 వంగడం కష籍 ం尾 푁ం頿 Shawkat Guirguis Hafedh Hussein, Mohammed Elm- ogy. 2014. Automatic english question generation Q: system based on template driven scheme. 框పుప్졁籍 ,బట졁 , 尾灁졁 , ప챁챍, 尿ꁆꁍ졁 బ瀾쁁졋 ఎ젾 告ꀿ , ఊ챋챍 ఇం簿ం簿吿 푆찿졍 అ롁롍呁ꁇ 퐾葁? Tsukasa Hirashima Akira Takeuchi Hidenobu Ku- A: 框పుప్졁籍 ,బట졁 , 尾灁졁 , ప챁챍 , 尿ꁆꁍ졁 బ瀾쁁졋 nichika, Tomoki Katayama*. 2004. Automated చవక尾 告ꀿ , ఊ챋챍 ఇం簿ం簿吿 푆찿졍 అ롁롍呁ꁇ 퐾葁 question generation methodsfor intelligent english learning systems and its evaluation. Q: 렾న졍 ꁀꁍ롋టక簿籍 ,尾萿ద례ద푇,బ瀾쁁ꁁం栿 Agus Gunawan Holy Lovenia, Felix Limanta. 2018. ఊ챋챍 , ఊ챋졍 ꁁం栿 逿쀿尿 ఎవ쀿 ఇం簿吿 逿ꡇꡍ퐾葁? Automatic question-answer pairs generation from A: text. 렾న졍 ꁀꁍ 롋ట క簿籍 , 尾萿ద 례ద 푇 , బ瀾쁁 ꁁం栿 ఊ챋챍 , ఊ챋졍 ꁁం栿 逿쀿尿 అతꀿ ఇం簿吿 逿ꡇꡍ퐾葁 Kyoung-Soo Han Do-Sang Yoon Joo-Young Lee Hae- Chang Rim Hoojung Chung, Young-In Song. A Q: అ렾యక ꠿桁క ఎక呍డ吿, ఎం顁呁 అꀿ practical qa system in restricted domains. అడగ呁ం萾, ఆ 吾呁లꁁ 屁萿葍 尾 న렿롍 ఏ렿 桇ం頿? A: Rani Horev. 2018. Bert explained: State of the art lan- అ렾యక ꠿桁క ఎక呍డ吿, ఎం顁呁 అꀿ guage model for nlp. అడగ呁ం萾, ఆ 吾呁లꁁ 屁萿尾 న렿롍 퐾簿運 葍 Gorka Labaka-Eneko Agirre Ondej Bojar Ivana Kva- 푆찿챍ం頿. pilíková, Mikel Artetxe. 2020. Unsupervised mul- tilingual sentence embeddings for parallel corpus Q: ꠿桁క 렾ట నమ롍졇顁 క頾 , 頾ꀿ వౖెꡁ అసహ뱍ం尾 mining. 桂 మ쁋 ఎꀿꁍ 顆బ끍졁 푇쁁? A: Prateek Joshi. 2018. An introduction to text summa- ꠿桁క 렾ట నమ롍졇顁 క頾 , 頾ꀿ వౖెꡁ అసహ뱍ం尾 rization using the textrank algorithm (with python 桂 మ쁋 쁆ం葁 顆బ끍졁 푇쁁 implementation). Q: Xu Liang. 2019. Understand textrank for keyword ex- ఆ 吾呁ల運 ꠿桁క吿 ꁍహం అ밿뱍ం頾? A: traction by python: A scratch implementation by అ푁ꁁ, ఆ 吾呁ల運 ꠿桁క吿 ꁍహం అ밿뱍ం頿. python and spacy to help you understand pagerank and textrank for keyword extraction. Q: ఒ吾ꁊకపుప్葁 ఎక呍డ ఒక అ렾యకꡁ ꠿桁క Detmar Meurers Maria Chinkina. 2017. Question gen- 푁ం葇頿? A: eration for language learning: From ensuring texts ఒ吾ꁊకపుప్葁 ఒక ఊ쀿졋 ఒక అ렾యకꡁ ꠿桁క are read to supporting learning. 푁ం葇頿. Holger Schwenk Mikel Artetxe. 2019. Massively mul- Q: tilingual sentence embeddings for zero-shot cross- ఏమꀿ ꠿桁క ꠾쁍 鱇య ప萿ం頿? lingual transfer and beyond. A: 뀾끋뱍! 뀾끋뱍! ꀾ తꡇꡍ례 졇顁, ꁇꁁ Ms. Vrushali Bhise Rushali A. Deshmukh Mr. Shub- అ렾య呁쀾젿ꀿ, ꁇꁇ례 桇య졇顁, నꁁꁍ వ頿졇యం萿! ham Bhosale, Ms. Diksha Joshi. Automatic text అꀿ ꠿桁క ꠾쁍 鱇య ప萿ం頿. summarization based on frequency count for marathi e-newspaper. Table 2: Sample questions generated by the Nishant Nikhil. 2017. Sentence embedding: A litera- system ture review - towards data science. Iryna Gurevych Nils Reimers. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.

36 Rajathurai Nishanthi. 2020. Understanding of the im- Q:What kind of sack was hard to carry? portance of mother tongue learning. A:That much of a heavy sack was hard to carry.

Q:In the market how was he buying sandals, Mukul Hase Manish Shrivastava Payal Khullar, Koni- clothes, bangles, fruits, utensils - and sold gari Rachna. Automatic question generation using relative pronouns and adverbs. them in the village? A:In the market how was buying sandals, clothes, bangles, fruits, utensils for cheap rates Sivaji Bandyopadhyay Rami Reddy Nandi Reddy. and sold them in the village. 2006. Dialogue based question answering system in telugu. Q:Packing all the things, putting them on the donkey, from market to village, from village to Nils Reimers. 2021. Sentencetransformers documen- whose house was he taking them? tation - sentence-bert: Sentence embeddings using A:Packing all the things, putting them on the siamese bert-networks. donkey, from market to village, from village to his own house he was taking them.

Roshni. 2020. 8 reasons why the nep’s move to teach- Q:How did the innocent sparrow believed the ing in mother tongue could transform teaching and learning in india. crows without even asking why and where? A:The innocent sparrow believed the crows blindly without even asking why and where. Md. Sajjatul Islam Md. Shahnur Azad Chowdhury-Md. Jiabul Hoque Shudipta Sharma, Muhammad Ka- Q:Instead of believing the sparrow, looking at mal Hossen. Automatic question and answer gen- it with disgust how many times did they beat eration from bengali and english texts. it? A:Instead of believing the sparrow, looking at Serge Sharoff Siva Reddy. 2011. Cross language pos it with disgust they beat it 2 times. taggers (and other tools) for indian languages: An experiment with using telugu resources. Q:Did the sparrow made friends with the crows? A:Yes, the sparrow made friends with the 10 Appendix crows.

Q:Once upon a time where was the innocent List of words related to time: sparrow living? 'అపుప్葁', '쁋灁' , '吾లం', 'యం吾లం', 'ఉదయం', A:Once upon a time the innocent sparrow was 'మ鰾뱍హꁍం', '쀾逿쁍 ', 'పగ졁', 'ꁆల', '퐾రం', living in a village. 'సంవతరం', '쀾뱍సమయం',遍 '�둋దయం', '頿నం', 'సమయం', 'వర렾నం'遍 , 'ꡂర푍ం', 'భ퐿ష뱍遁遍', Q:What did the sparrow say pleadingly? 'మ퐾రం', 'మంగళ퐾రం', '끁ధ퐾రం', '屁쁁퐾రం', A:The sparrow said pleadingly, "No! no! I '�క쁍 퐾రం', 'శꀿ퐾రం', 'ఆ頿퐾రం', '렾సం' didn’t do any mistake, I’m innocent, I did nothing, please leave me." Translations Then, day, time period, evening, morning, afternoon, night, morning(synonym), month, week, year, sunset, sunrise, day(syn- Table 4: Translations of the results in Table 2 onym), time, present, past, future, Monday, in section 9 Tuesday, Wednesday, Thursday, Friday, Satur- day, Sunday, month(synonym).

Table 3: This set comprises of the time-related words that have a high chance of being used in a storybook.

37