Id 1 Question _____fields comes under natural language processing. A Computer Science B C Linguistics D All of the mentioned Marks 1.5 Unit 1

Id 2 Question NLP is concerned with the interactions between computers and human (natural) languages. A True B False Marks 1.5 Unit 1

Id 3 Question What is the main challenge/s of NLP? A Handling of Sentences B Handling Tokenization C Handling POS-Tagging D All of the mentioned Marks 1.5 Unit 1

Id 4 Question The study of linguistic sounds and their relations to is study of phonology. A True B False Marks 1.5 Unit 1

Id 5 Question A text is composed of a set of ______from a vocabulary A language B lines C words D Both A and B Marks 1.5 Unit 1

Id 6 Question Choose from the following areas where NLP can be useful. A Automatic Text Summarization B Automatic Question-Answering Systems C Information Retrieval D All of the mentioned Marks 1.5 Unit 1

Id 7 Question How the text is used A Creating language mnemonics B Identify objects C Building the syntactic tree of a sentence D None of the above Marks 1.5 Unit 1

Id 8 Question What is the Named-entity recognition? A Identify entity with language structure B Identifying pre-defined entity types in a sentence C Categorize structure of entity D All of the above Marks 1.5 Unit 1

Id 9 Question sense disambiguation mainly deals with ______. A Identify of words B Identify language structure C Figuring out the exact meaning of a word or entity D Categorize syntax of word Marks 1.5 Unit 1

Id 10 Question What it means of ? A Extracting subject-predicate-object triples from a sentence B Identify syntax of words C Creating labeling on words D All of the above Marks 1.5 Unit 1

Id 11 Question Phonetics and phonology is a subject related to ______. A Study of words and syntax B Information of words and language C The study of linguistic sounds and their relations to words D None of the above Marks 1.5 Unit 1

Id 12

Question Natural language processing is divided into the two subfields of A Symbolic and numeric B Algorithmic and heuristic C Time and motion D Understanding and generation Marks 1.5 Unit 1

Id 13 Question Which of the following is demerits of Top-Down Parser? A It is hard to implement B Slow speed C inefficient D Both B and C Marks 1.5

Unit 1

Id 14 Questio In linguistic morphology ______is the process for reducing n inflected words to their root form. A Rooting B C Text-Proofing D Both Rooting & Stemming Marks 1.5 Unit 1

Id 15 Question How many steps of NLP is there? A 1 B 4 C 6 D 5 Marks 1.5 Unit 1

Id 16 Question Which of the following is used to map sentence plan into sentence structure? A Text planning B Sentence planning C Text Realization D None of the Above Marks 1.5 Unit 1

Id 17 Question Which of the following includes major tasks of NLP? A Discourse Analysis B C D All of the above Marks 1.5 Unit 1

Id 18 Questio What is full the form of NLG? n A Natural Language Generation B Natural Language Genes C Natural Language Growth D Natural Language Generator Marks 1.5 Unit 1

Id 19 Question ______method is used to increase standard of NLP. A Summarize blocks of text B Automatically generate keyword tags C Identify the type of entity extracted D All of the above Marks 1.5 Unit 1

Id 20 Question Machine translation is that convert ______. A Human language to machine language B One human language to another C Any human language to English D Machine language to human language Marks 1.5 Unit 1

Id 21

Question Which of the following NLP tasks use sequential labeling technique? A POS tagging B Named Entity Recognition C

D All of the above

Marks 1.5

Unit 1

Id 22 Question Which of the following techniques can be used for keyword normalization in NLP, the process of converting a keyword into its base form? A Lemmatization B Soundex C Cosine Similarity D N-grams Marks 1.5 Unit 1

Id 23 Question Which one of the following are keyword Normalization techniques in NLP. A Stemming B Part of Speech C Named entity recognition D structure Marks 1.5 Unit 1

Id 24 Question In NLP, The process of removing words like “and”, “is”, “a”, “an”, “the” from a sentence is called as ______. A Stop words B lemmanization C stemming D None of the above Marks 1.5 Unit 1

Id 25 Question In NLP, The process of converting a sentence or paragraph into tokens is referred to as Stemming. A True B False Marks 1.5 Unit 1

Id 26 Question What type of ambiguity exists in the word sequence “Time flies”? A Syntactic B Semantic

C Phonological

D Anaphoric

Marks 1.5

Unit 1

Id 27 Question Text analysis can be broken into several sub-categories, including morphological, grammatical, syntactic and semantic analyses. A True B False Marks 1.5 Unit 1

Id 28 Question ______defines how words and sentences are put together. A Syntax B words C language D Structure Marks 1.5 Unit 1

Id 29 Question ______is used to explain the content of spoken expressions. A voice B words

C Pragmatics D syntax Marks 1.5 Unit 1

Id 30 Question Dividing a sentence into phrases is known as ______.

A parsing B chunking C syntax D morphology Marks 1.5 Unit 1

Id 31 Question ‘My cat likes to drink milk’ how many NP and VP are present A NP 3,VP 3 B NP 2 ,VP 4 C NP 1 VP 5 D NP 3 VP 2 Marks 1.5 Unit 1

Id 32 Question Sentence 1: Chop the carrots on the board Sentence 2: She’s the chairman of the board Above example fall under ______category. A B Syntax analysis C Stemming D None of the above Marks 1.5 Unit 1

Id 33 Question

______problem made human language is ambiguous.

A Syntax ambiguity B Lexical ambiguity C Semantic ambiguity D All of the above Marks 1.5 Unit 1

Id 34

Question Ambiguity is the primary difference between natural and ______. A Human language B Computer language C Modern language D None of the above Marks 1.5 Unit 1

Id 35

Question Breaking a string of characters into a sequence of words is called as______.

A Word segmentation B Syntax creation C Lemmanization D All of the above Marks 1.5 Unit 1

Id 36 Questio Word: ‘Independently’ n Calculate morphological analysis of given word. A Independent+ly B In+(depend+ent)+ly C Independ+ently D Both A and C Marks 1.5 Unit 1

Id 37 Question Sentence: I ate the spaghetti with meatballs calculate NP and VP phrases

A NP 3 ,VP 2 B NP 2 ,VP 3 C NP 3 ,VP 3 D NP 1 ,VP 4 Marks 1.5 Unit 1

Id 38 Questio n SENTENCE 1: Ellen has a strong interest in computational linguistics. SENTENCE 2: Ellen pays a large amount of interest on her credit card. In the above example interest is used to define which type of problem? A Syntax ambiguity B Word sense disambiguation C Language problem D Both A and B Marks 1.5 Unit 1

Id 39 Question The Python multiplication operation can be applied to lists. What happens when you type ['good', 'morning'] * 3 , A ('good morning', 'good morning', 'good morning’) B ('goodmorning', 'goodmorning', 'goodmorning’) C ('good', 'morning', 'good', 'morning', 'good', 'morning’) D Both A and B Marks 1.5 Unit 1

Id 40 Questio Which python operation is used to combine two strings? n A multiplication B concate C combine D string Marks 1.5 Unit 1

Id 41 Questio ______Toolkit use for creating nlp applications. n A learn B numpy C (Nltk) D None of the above Marks 1.5 Unit 1

Id 42 Questio ______python syntax is used to search any keyword in text. n A text.concate(“keyword”) B text.concordance(“keyword”) C Text.search(“keyword”) D All of the above Marks 1.5 Unit 1

Id 43 Question ______python function is used to calculate length of text. A calc B len C length D Both A and C Marks 1.5 Unit 1

Id 44 Questio How to represent graphical plot of the frequency distribution in python. n A Fdist.plot() B Fdist.bar() C Fdist.gpl() D None of the above Marks 1.5 Unit 1

Id 45 Questio ______function is used to convert upper case letter into lowercase. n A islower() B isupper() C issmall() D Both B and C Marks 1.5 Unit 1

Id 46 Question ______function is used to convert lower case letter into uppercase. A islower() B isbig() C isupper() D Both A and B Marks 1.5 Unit 1

Id 47 Question ______python function is used to print only substrings. A Slice[] B Cut[] C Substring[] D All of the above Marks 1.5 Unit 1

Id 48 Question name = 'Monty' print M as output, what will be the syntax of string. A name[:3] B name[0] C name[0:2] D None of the above Marks 1.5 Unit 1

Id 49 Question How to apply stemming process on word ‘playing’. A Remove ing from word playing B Remove play from word playing C Create another word D None of the above Marks 1.5 Unit 1

Id 50 Question ______is the technical name for text. A corpa B corpus C word D document Marks 1.5 Unit 1

Id 51 Question How to apply stemming process on word ‘studied’. A stud B studying C studied D study Marks 1.5 Unit 1

Id 52 Question In morphological analysis what will be value of give words: analyzing ,stopped, dearest. A Analyze,stop,dear B Analyzed,stop,dear C Analyze,stopping,dear D Both B and C Marks 1.5 Unit 1

Id 53 Question Name = 'DBATU' print BATU as output, what will be syntax of . string. A name[1:4] B name[2:3] C name[0] D name[3:4] Marks 1.5 Unit 1

Id 54 Question Sentence: This is a sample sentence, showing off the stop words filtration What will be the sentence after stop words removal?

A This is a sample sentence, showing the stop words filtration B This sample sentence, showing stop words filtration.

C is a sample sentence, showing the stop words filtration

D This is a sample sentence, showing off the stop words filtration Marks 1.5 Unit 1

Id 55 Question ______helps make a machine understand the meaning of a text. A Syntax analysis B Semantic analysis C Structure analysis D None of the above Marks 1.5 Unit 1

Id 56 Question ______analysis is used to analyze comment of customer or feedback of any product. A Semantic B syntactic C sentiment D Both A and B Marks 1.5 Unit 1

Id 57 Question What will be lemma of given words Boy’s , cars, colors? A Boy,cars,colors B Boys,cars,colors C Boys,cars,color D Boy,car,color Marks 1.5 Unit 1

Id 58 Question Google assistant uses ______method of nlp. A Text to speech B Speech to text C voice D Both A and B Marks 1.5 Unit 1

Id 59 Question ______google product is used to translate given word into any other language. A Google search B Google translate C Google conversion D None of the above Marks 1.5 Unit 1

Id 60 Question ______a lexical reference system based on psycholinguistic theories of human lexical memory.

A word B C label D Syntactic analysis Marks 1.5 Unit 1

Id 61 Question Sentence 1: John’s home was decorated with lights on the occasion of Christmas. Sentence 2: Mercury is situated in the eighth house of John’s horoscope In above sentence what will be best example of wordnet: Synonymy

A decorated B john’s horoscope C home D lights Marks 1.5 Unit 1

Id 62 Question Number of trivial substrings in “GATE2020” are ______. A 2 B 4 C 23 D 1 Marks 1.5 Unit 1

Id 63 Question If your words Woodchuck, woodchuck etc. what will be pattern or regular expression? A [wW]oodchuck B [WW]oodchuck C [ww] oodchuck D Both B and C Marks 1.5 Unit 1

Id 64 Questio Find out correct pair of Hypernymy and Hyponymy. n A Lion->bird B Peacock->sky C Boy->house D Bus->railway Marks 1.5 Unit 1

Id 65 Question Sentence: Sherry’s mother is very generous but she is stingy. What will be Antonymy present in given word?

A Mother->generous B Generous->stingy C Mother->she D None of the above Marks 1.5 Unit 1

Id 66 Question Find out correct pair of Metonymy and Homonym. A Men->computer B Water->bag C Mobile->water D Kitchen->house Marks 1.5 Unit 1

Id 67 Question Find out wrong pair of wordnet: Antonymy. A Achieve – Fail B Idle – stable C Afraid – Confident D All of the above Marks 1.5 Unit 1

Id 68 Question Find out correct pair of wordnet: entailment which is present in given sentences Sentence: A big animal elephant is snoring when he is sleeping. A Big->elephant B Elephant->snoring C Snoring->sleeping D Both A and B Marks 1.5 Unit 1

Id 69 Question If you want to print digits on your screen, what will be regular pattern? A [0123456789] B [Any digit] C [0912345adc] D None of the above Marks 1.5 Unit 1

Id 70 Question If you want to print words like:[good1],[number3] on your screen, what will be regular pattern?

A ^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)?$ B ^[a-Z0-9]+(?:_[a-zA-Z0-9]+)?$ C ^[a-zA-Z0-9]+(?: D None of the above Marks 1.5 Unit 2

Id 71 Question What is regular expression pattern for given strings: oh! ooh! oooh! ooooh! A *h! B o*h! C oo*h! D Both A and B Marks 1.5 Unit 2

Id 72 Question What is regular expression pattern for given strings: baa baaa baaaa baaaaa A ba+ B baa+ C Baa+b D Baaa! Marks 1.5 Unit 2

Id 73 Question ______tokenization step is used to arrange words alphabetically. A sorting B searching C capitalize D None of the above Marks 1.5 Unit 2

Id 74 Question In given word state-of-the-art will be consider as state of the art due to issue of______. A Sentence issue B Words problem C Tokenization D None of the above Marks 1.5 Unit 2

Id 75 Question A search algorithm takes ______as an input and returns ______as an output. A Input, output B Problem, solution C Solution, problem D Parameters, sequence of actions Marks 1.5 Unit 2

Id 76 Question N-grams are defined as the combination of N keywords together. How many bi-grams can be generated from given sentence: “coursera website is a great source to learn data science”. A 9 B 11 C 10 D 12 Marks 1.5 Unit 2

Id 77 Question How many phrases can be generated from the following sentence, after performing following text cleaning steps?  Stopword Removal  Replacing punctuations by a single space “#coursera-website is a great source to learn @data_science.” A 6 B 5 C 10 D 9 Marks 1.5 Unit 2

Id 78 Question Which of the following regular expression can be used to identify date(s) present in the text object? “The next meetup on data science will be held on 2017-09-21, previously it happened on 31/03, 2016” A \d{4}-\d{2}-\d{2} B (19|20)\d{2}-(0[1-9]|1[0-2])-[0-2][1-9] C (19|20)\d{2}-(0[1-9]|1[0-2])-([0-2][1-9]|3[0-1]) D None of the above Marks 1.5 Unit 2

Id 79 Question N-grams are defined as the combination of N keywords together. How many bi-grams can be generated from given sentence? “She is cute and adorable”. A 3 B 5 C 6 D None of the above Marks 1.5 Unit 2

Id 80 Question Which of the following techniques can be used to compute the distance between two word vectors in NLP? A Lemmatization B Euclidean distance C Cosine Similarity D Both B and C Marks 1.5 Unit 2

Id 81 Question What are the possible features of a in NLP? A Count of the word in a document B Vector notation of the word C Part of Speech Tag D All of the above Marks 1.5 Unit 2

Id 82 Question In NLP, Tokens are converted into numbers before giving to any Neural Network. A True B False Marks 1.5 Unit 2

Id 83

Question Identify the odd one out. A nltk. B scikit learn C SpaCy D BERT Marks 1.5 Unit 2

Id 84 Question Let G = (V, T, S, P) be a context-free such that Variables V = {S, R}, Terminal symbols T = {0, 1} Productions P = {S → R1R1R1R, R → 0R | 1R |$} Which of the following languages are supported by this grammar?

A L = {w | w contains at least three 1’s} B {w | the length of w is odd and its middle is 0} C {w | w contains more 1's than 0's} D All of the above Marks 1.5 Unit 2

Id 85 Question In NLP, Bidirectional context is supported by which of the following embedding. A B BERT C GloVe D All the above Marks 1.5 Unit 2

Id 86 Question Language Biases are introduced due to historical data used during training of word embeddings, which one amongst the below is not an example of bias stmt 1:New Delhi is to India, Beijing is to China stmt 2:Man is to Computer, Woman is to Homemaker A Stmt1 B Stmt 2 C Stmt 1 and stmt2 D None of the above Marks 1.5 Unit 2

Id 87 Question What is the right order for a text classification model component text cleaning 1. Text annotation 2. Gradient descent 3. Model tuning 4. Text to predictors A 12345 B 13425 C 12534 D 13452 Marks 1.5 Unit 2

Id 88 Question Which of the following models can perform tweet classification with regards to context mentioned above? A Naive Bayes B SVM C None of the above D Both A and B Marks 1.5 Unit 2

Id 89 Question Which algorithm is used for solving temporal probabilistic reasoning? A Hill-climbing search B Hidden markov model C Depth-first search D Breadth-first search Marks 1.5 Unit 2

Id 90 Question How does the state of the process is described in HMM? A Literal B Single random variable C Single discrete random variable D None of the mentioned Marks 1.5 Unit 2

Id 91 Question What are the possible values of the variable? A Variables B Literals C Discrete variable D Possible states of the world Marks 1.5 Unit 2

Id 92 Question Where does the additional variables are added in HMM? A Temporal model B Reality model C Probability model D All of the mentioned Marks 1.5 Unit 2

Id 93 Question Where does the speech model is used? A Speech recognition B Understanding of real world C Both Speech recognition & Understanding of real world D None of the mentioned Marks 1.5 Unit 2

Id 94 Question Which variable can give the concrete form to the representation of the transition model? A Single variable B Discrete state variable C Random variable D Both Single & Discrete state variable Marks 1.5 Unit 2

Id 95 Question Parsing determines Parse Trees (Grammatical Analysis) for a given sentence. A True B False Marks 1.5 Unit 2

Id 96 Question What is a top-down parser? A Begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written B Begins by hypothesizing a sentence (the symbol S) and successively predicting upper level constituents until individual preterminal symbols are written. C Begins by hypothesizing lower level constituents and successively predicting a sentence (the symbol S) D Begins by hypothesizing upper level constituents and successively predicting a sentence (the symbol S) Marks 1.5 Unit 2

Id 97 Question Social Media platforms are the most intuitive form of text data. You are given a corpus of complete social media data of tweets. How can you create a model that suggests the hashtags? A Perform Topic Models to obtain most significant words of the corpus B Train a Bag of Ngrams model to capture top n-grams – words and their combinations C Train a word2vector model to learn repeating contexts in the sentences D All of the above Marks 1.5 Unit 2

Id 98 Question While working with context extraction from a text data, you encountered two different sentences: The tank is full of soldiers. The tank is full of nitrogen. Which of the following measures can be used to remove the problem of word sense disambiguation in the sentences? A Compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood B Co-reference resolution in which one resolute the meaning of ambiguous word with the proper noun present in the previous sentence C Use dependency parsing of sentence to understand the meanings D None of the above Marks 1.5 Unit 2

Id 99 Question Collaborative Filtering and Content Based Models are the two popular recommendation engines, what role does NLP play in building such algorithms. A Feature Extraction from text B Measuring Feature Similarity C Engineering Features for vector space learning model D All of these Marks 1.5 Unit 2

Id 100 Question What is the major difference between CRF (Conditional Random Field) and HMM (Hidden Markov Model)? A CRF is Generative whereas HMM is Discriminative model B CRF is Discriminative whereas HMM is Generative model C Both CRF and HMM are Generative model D Both CRF and HMM are Discriminative model Marks 1.5 Unit 2

Id 101 Question Dissimilarity between words expressed using cosine similarity will have values significantly higher than 0.5 A True B False Marks 1.5 Unit 2

Id 102 Question Which of the following are NLP use cases? A Detecting objects from an image B Facial Recognition C Speech Biometric D Text Summarization Marks 1.5 Unit 2 ‘

Id 103 Question How many terms are required for building a bayes model? A 3 B 2 C 1 D 4 Marks 1.5 Unit 2

Id 104 Question What is needed to make probabilistic systems feasible in the world? A Reliability B Crucial robustness C Feasibility D None of the mentioned

Marks 1.5 Unit 2

Id 105 Question Where does the bayes rule can be used? A Solving questionaries’ B Increasing complexity C Decreasing complexity D Answering probabilistic questionaries’ Marks 1.5 Unit 2

Id 106 Question Begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written in______parser. A Top down B Bottom up C LL parser D Both B and C Marks 1.5 Unit 2

Id 107 Question How the entries in the full joint probability distribution can be calculated? A Using variables B Using information C Both Using variables & information D None of the mentioned Marks 1.5 Unit 2

Id 108 Question You have collected a data of about 250 rows of news fields. You want to create a news classification model that categorizes each of the news fields in four categories such as politics,sports,careers,entertainment etc, Which of the following models can perform news classification with regards to context mentioned above? A SVM B Naive bayes C Both A and B D None of the above Marks 1.5 Unit 2

Id 109 Question Which of the following features can be used for accuracy improvement of a classification model? A Frequency count of terms B Vector Notation of sentence C Dependency Grammar D All of these Marks 1.5 Unit 2

Id 110 Question Solve the equation according to the sentence “I am planning to visit New Delhi to attend Analytics course Delhi Hackathon”. A = (# of words with Noun as the part of speech tag) B = (# of words with Verb as the part of speech tag) C = (# of words with frequency count greater than one) What are the correct values of A, B, and C? A 5, 5, 2 B 5, 5, 0 C 7, 4, 2 D 7, 5, 1 Marks 1.5 Unit 2

Id 111 Question Naive Bayes classifiers are a collection of classification algorithms based on _____. A Naive theorem B Classification model

C Bayes’ Theorem D None of the above Marks 1.5 Unit 2

Id 112

Question Gaussian is the model of ______classifier. A Decision tree B Svm classifier C Naive Bayes D KNN Marks 1.5 Unit 2

Id 113 Question Solve the equation according to the sentence “I am playing for India and visit to Bangalore to attend T20 match”. A = (# of words with Noun as the part of speech tag) B = (# of words with Verb as the part of speech tag) C = (# of words with frequency count greater than one) What are the correct values of A, B, and C? A 5,4,1 B 6,2,1 C 5,3,1 D 4,3,2 Marks 1.5 Unit 2

Id 114 Question In a given example, a doctor knows that cold causes fever 50% of time, a prior probability of any patient having cold 1/50,000 and prior probability of any patient having fever is 1/20.If patient has fever,what’s probability he/she has cold? A 0.101 B 0.0002 C 0.002 D 0.010 Marks 1.5 Unit 2

Id 115 Question In a given example, A doctor knows that meningitis causes stiff neck 30% of the time, Prior probability of any patient having meningitis is 1/20,000 ,Prior probability of any patient having stiff neck is 1/40 .If a patient has stiff neck, what’s the probability he/she has meningitis? A 0.006 B 0.0006 C 0.0063 D 0.100 Marks 1.5 Unit 2

Id 116 Question If one of the conditional probability is zero, then the entire expression becomes______. A None B Zero C Undefined D None of the above. Marks 1.5 Unit 2

Id 117 Question For Hmm model with N hidden states,V observable states,what are the dimensions of parameter matrics A,B and pie∏? A:Transition matrix,B:Emission matrix,∏:Initial probability matrix.

A N *V,N*V,N*N B N*N,N*V,N*1 C N*N,V*V,N*1 D N*V,V*V,V*1 Marks 1.5 Unit 2

Id 118 Question Which of the following words contains both derivational as well as inflectional suffixes: A regularity B carefully C older D availabilities Marks 1.5 Unit 2

Id 119 Question Which one of the following is a top-down parser? A Recursive descent parser B Operator precedence parser C An LR(k) parser D An LALR(k) parser Marks 1.5 Unit 2

Id 120 Question Which of the following are true with respect to a top-down and bottom -up parser? 1.A top-down parser never explores options that will not lead to a full parse. 2.A bottom-up parser never explores options that will not lead to a full parse. 3.A top-down parser never explores options that do not connect to the actual sentence. 4.A bottom-up parser never explores options that do not connect to the actual sentence. A 1 B 2 C 3 D 1 and 4 Marks 1.5 Unit 2

Id 121 Question What is the lin similarity between ‘vehicle’ and ‘Table ware’. A 0.138 B 0.215 C 0.78 D None of the above Marks 1.5 Unit 2

Id 122 Question Which of the following is the lowest common hypernym for the pair of words “student” and “teacher” as per wordnet? A Person B Body C Professional D lady Marks 1.5 Unit 2

Id 123 Question Consider the following corpus c1 of 4 sentences. What is the total count of unique bi-grams for which the likelihood will be estimated? assume we do not perform any pre-processing. 1.Today is shreya’s birthday 2.she loves ice cream. 3.She is also fond of cream cake 4.we will celebrate her birthday with ice cream cake A 24 B 28 C 27 D 23 Marks 1.5 Unit 2

Id 124 Question A 4-gram model is a ______order Markov model. A Constant B One C Two D Three Marks 1.5 Unit 2

Id 125 Question Which of the following doesn’t require application of nlp algorithms? A Classifying spam emails from good ones B Classifying images of scanned documents as “hand -written”or “printed”documents. C Automatically generating captions for images. D Building a sentiment analyzer for tweets on Twitter. Marks 1.5 Unit 2

Id 126 Question Ambiguity can occur in which following steps: A Tokenization B Language understanding C Sentence segmentation D All of the above Marks 1.5 Unit 2

Id 127 Question Which are the following is instance of stemming. A I-am B Cooking-cook C Eat-ea D Sleep-slp Marks 1.5 Unit 2

Id 128 Question Word segmentation mostly used when A No spaces between words B Duplication of words C Hy pens are present D Long sentence Marks 1.5 Unit 3

Id 129 Question In the sentence, “In Delhi I took my hat off. But I can’t put it back on.”,total number of word tokens and word types are: A 14,13 B 13,14 C 15,14 D 14,15 Marks 1.5 Unit 3

Id 130 Question Retrieval based models and Generative models are the two popular techniques used for building . Which of the following is an example of retrieval model and generative model respectively. A Dictionary based learning and Word 2 vector model B Rule-based learning and Sequence to Sequence model C Word 2 vector and Sentence to Vector model D Recurrent neural network and conventional neural network Marks 1.5 Unit 3

Id 131 Question Collaborative Filtering and Content Based Models are the two popular recommendation engines, what role does NLP play in building such algorithms. A Feature Extraction from text B Measuring Feature Similarity C Engineering Features for vector space learning model D All of these Marks 1.5 Unit 3

Id 132 Question While working with context extraction from a text data, you encountered two different sentences: The tank is full of soldiers. The tank is full of nitrogen. Which of the following measures can be used to remove the problem of word sense disambiguation in the sentences? A Compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood B Co-reference resolution in which one resolute the meaning of ambiguous word with the proper noun present in the previous sentence C Use dependency parsing of sentence to understand the meanings D None of the above Marks 1.5 Unit 3

Id 133 Question Assume that there are 10000 documents in a collection. Out of these, 50 documents contain the terms “Good evening”. If “good evening” appears 3 times in a particular document, what is the TFIDF value of the terms for that document? A 15.8 B 12.8 C 13.4 D 12.3 Marks 1.5 Unit 3

Id 134 Question Most successful general purpose document retrieval methods are ______methods.

A Language B statistical

C classification D All of the above Marks 1.5 Unit 3

Id 135 Question ______is one that searches a collection of natural language documents.

A Search B Information retrieval system C classification D Both A and C Marks 1.5 Unit 3

Id 136 Question In the Information retrieval system______is used to arrange documents. A Indexing B capitalize C Language structure D None of the above Marks 1.5 Unit 3

Id 137 Question Indexing is the process of selecting terms to represent a text. A False B True Marks 1.5 Unit 3

Id 138 Question Indexing technique consist of ______model. A Data model B Algebraic model C Vector space model D All of these Marks 1.5 Unit 3

Id 139 Question In the sentence, “Today is rainy day, you should carry umbrella.”, total number of word tokens are ______. A 8 B 7 C 6 D 5 Marks 1.5 Unit 3

Id 140 Question In ______model queries are represented as Boolean combinations of the terms.

A SVM B Naive C Boolean D Both A and B Marks 1.5 Unit 3

Id 141 Question ______is the process of computing a measure of similarity between two text representations.

A Matching B Indexing C compare D Hashing Marks 1.5 Unit 3

Id 142

Question ______is simply the number of times a given term appears in that document.

A Document frequency B Term frequency C Language frequency D All of these Marks 1.5 Unit 3

Id 143 Question (total no. of documents)/(no. of documents containing ith term) the given formula represent ______.

A Inverse document frequency B Document frequency C frequency distribution D None of the above Marks 1.5 Unit 3

Id 144 Question You created a document term matrix on the input data of 20K documents for a Machine learning model. Which of the following can be used to reduce the dimensions of data? A Keyword Normalization B Latent Semantic Indexing C Latent Dirichlet Allocation D All of the above Marks 1.5 Unit 3

Id 145 Question Which of the text parsing techniques can be used for noun phrase detection, verb phrase detection, subject detection, and object detection in NLP. A Part of speech tagging B Skip Gram and N-Gram extraction C Continuous Bag of Words D Dependency Parsing and Constituency Parsing Marks 1.5 Unit 3

Id 146 Question In a corpus of N documents, one randomly chosen document contains a total of T terms and the term “hello” appears K times. What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “hello” appears in approximately one-third of the total documents? A KT * Log(3) B T * Log(3) / K C K * Log(3) / T D Log(3) / KT Marks 1.5 Unit 3

Id 147 Question In NLP, the algorithm decreases the weight for commonly used words and increases the weight for words that are not used very much in a collection of documents A Term Frequency (TF) B Inverse Document Frequency (IDF) C Word2Vec D Latent Dirichlet Allocation (LDA) Marks 1.5 Unit 3

Id 148 Question TF-IDF helps you to establish? A most frequently occurring word in the document B most important word in the document C Both A and B D None of the above Marks 1.5 Unit 3

Id 149 Question Assume a corpus with 350 tokens in it. We have 20 word types in that corpus (V = 20). The frequency (unigram count) of word types “short” and “fork” are 25 and 15 respectively. If we are using the Laplace

smoothing, which of the following is PLaplace(“fork”)?

A 15/350

B 16/370

C 30/370 D 31/370 Marks 1.5

Unit 3

Id 150

Question Assume that there are 10000 documents in a collection. Out of these, 50 documents contain the terms “difficult task”. If “difficult task” appears 3 times in a particular document, what is the TFIDF value of the terms for that document?

A 8.11

B 11.89 C 15.9 D 14.3 Marks 1.5 Unit 3

Id 151

Question Ideally both precision and recall should be ______.

A 1 B 0 C 2 D Null Marks 1.5 Unit 3

Id 152 Question Distributed indexing is used in: A Parallel tasking B Web-scale indexing C Google data centers D All of the above Marks 1.5 Unit 3

Id 153 Question Which is a good idea for using skip pointers? A Fewer skips, larger skip spans B None C Depends upon the no. of comparisons needed D More skips, shorter skip spans Marks 1.5 Unit 3

Id 154

Question A large repository of documents in IR is called as: A Corpus B Database C Dictionary D Collection Marks 1.5 Unit 3

Id 155

Question Benefits of using a hash table is: A Do not need to rehash everything periodically if vocabulary keeps growing. B Lookup in a hash table is faster than lookup in a tree. C No prefix search is required D All of the above Marks 1.5 Unit 3

Id 156

Question We need external sorting algorithms to: A Maximize the disk seek time. B Maintain constant disk seek time C Minimize the disk seek time. D None of the above Marks 1.5 Unit 3

Id 157

Question For query optimization while intersecting two postings list, we should: A Process in the order of increasing document frequency B Process in any order C None of the above D Process in the order of decreasing document frequency Marks 1.5 Unit 3

Id 158

Question The goal of IR is to: A find documents relevant to an information need B find documents relevant to an information need from a given document set C find documents relevant to an information need from a large document set D find documents relevant to an information need from a small document set Marks 1.5 Unit 3

Id 159

Question Lemmatization is a technique for: A Ranking documents B Case folding C Normalization D Tokenization Marks 1.5 Unit 3

Id 160

Question A model of information retrieval in which we can pose any query in which search terms are combined with the operators AND, OR, and NOT: A Ad Hoc Retrieval B Ranked Retrieval Model C Boolean Information Model D Proximity query Model Marks 1.5 Unit 3

Id 161

Question The model of information retrieval in which we can pose any query in the form of a Boolean expression is called the ranked retrieval model. A True B False Marks 1.5 Unit 3

Id 162

Question In information retrieval, extremely common words which would appear to be of little value in helping select documents that are excluded from the index vocabulary are called: A Stop Words B Tokens C Simple words D Stemmed Terms Marks 1.5 Unit 3

Id 163

Question Which one of the following is not a pre-processing technique in NLP? A Stemming and Lemmatization B removing punctuations C removal of stop words D Marks 1.5 Unit 3

Id 164

Question Which of the following is(are) NOT true with Google Search Engine? A It offers specialized search services B It does stemming C It does stop-word removal D None of the choices Marks 1.5 Unit 3

Id 165

Question Fragment from an inverted index (augmented with positional information) is given below. Information: d1:12 ; d2:23,32,43; d3:13, d5:32,45,80 systems: d1:15; d2:34,42; d3: 35, d5: 38 Which of the following phrase(s) has(have) possible occurrences in the above document sequence? A Information retrieval systems B Information system C Information theory retrieval system D All of the above Marks 1.5 Unit 3

Id 166

Question If X denotes the length of string s1 and Y denotes the length of the string s2, then the edit distance between s1 and s2 is never more than------. A Min(X,Y) B Max(X,Y) C X+Y D None of the above Marks 1.5 Unit 3

Id 167

Question Given a document collection of 1000 documents which has 110 relevant documents for a given query and if the IR system retrieves 30 relevant and 15 irrelevant documents, what is the recall value of the system? A 0.27 B 0.58 C 0.4 D 0.3 Marks 1.5 Unit 3

Id 168

Question Given a document collection which has 35 relevant documents, if an IR system retrieves 10 relevant and 13 irrelevant documents, what is the precision value of the system? A 0.44 B 0.25 C 0.43 D 0.24 Marks 1.5 Unit 3

Id 169

Question Consider the following documents: Doc1: new home sales top forecasts Doc2: home sales rise in july Doc3: increase in home sales in july Doc4: july new home sales rise When the Term Document incidence matrix is constructed and the query home AND (new OR july)is executed on it, the resultant doc’s retrieved will be ______.

A Doc1 B Doc1,Doc2 C Doc1,Doc2,Doc4 D Doc1,Doc2,Doc3,Doc4 Marks 1.5 Unit 3

Id 170

Question Yahoo search engine uses stemming for its Index generation. A True B False Marks 1.5 Unit 3

Id 171

Question Select correct statements related to Sentiment analysis or opinion mining. A It is related to the application of natural language processing B Computational linguistics C Text analytics to identify and extract subjective D Text information in source materials Marks 1.5 Unit 3

Id 172

Question Select the Tools/Techniques that can be used with sentiment analysis. A B Semantic Orientation i.e., based on Point wise mutual Information C Grammatical dependency relations D All of these Marks 1.5 Unit 3

Id 173

Question Select correct statements related to the tasks of Sentiment analysis or opinion mining. A Classifying the polarity of a given text at the document, sentence, or feature/aspect level B Check, whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. C Some Advanced tasks captures, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy." D All of the above Marks 1.5 Unit 3

Id 174

Question Why are opinions important? A “Opinions” are key influencers of our behaviors B Whenever we need to make a decision, we often seek out the opinions of others. C Our beliefs and perceptions of reality are conditioned on how others see the world D All of the above Marks 1.5 Unit 3

Id 175

Question Identify the correct statements related to "Subjectivity and emotion." A Subjectivity: can be identified by using professional emotions or sentiments only. B Sentence subjectivity:a subjective sentence expresses some personal feelings, views, emotions, or beliefs. C Emotion: Emotions are people’s subjective feelings and thoughts. D Both B and C Marks 1.5 Unit 3

Id 176

Question The Subjectivity of any text depends upon ______. A Subjective expressions, e.g., opinions, allegations, desires, beliefs, suspicions, speculations B subjective sentences that may contain a positive or negative opinion C The sentence: “After taking the drug, there is no more pain”, is a subjective sentence D All of the above Marks 1.5 Unit 3

Id 177

Question Parts-of-Speech tagging determines A part-of-speech for each word dynamically as per meaning of the sentence B part-of-speech for each word dynamically as per sentence structure C all part-of-speech for a specific word given as input D all of the mentioned Marks 1.5 Unit 3

Id 178

Question You want to buy new mobile in this Diwali and lots of reviews about different mobiles are available. so, you have decided to use text summarization method to find out what is peoples’ opinions in general about a product. what kind of summarization method would you use to understand the reviews? A Extractive multi document summarization B Abstractive multi document summarization C Both A and B D None of the above Marks 1.5 Unit 3

Id 179

Question Which of the following affective states used in sentiment analysis. A Personality traits B moods C attitudes D emotions Marks 1.5 Unit 3

Id 180

Question Sentiment lexicons can be learned using intuitions such as______. A Same polarity words connected with ‘and’ B Opposite polarity words connected with ‘but’ C Both A and B D None of the above Marks 1.5 Unit 3

Id 181

Question Consider the sentence: “The touch screen was cool, but camera and voice clarity were very poor” which of the following is true. A Aspect: Touch screen, opinion:positive phrase:cool B Aspect: camera, opinion:positive phrase:very poor C Aspect: voice, opinion:positive phrase:very poor D Both B and C Marks 1.5 Unit 3

Id 182

Question Which of the following technique is not a part of flexible text matching? A Soundex B Metaphone C Edit Distance D Keyword Hashing Marks 1.5 Unit 3

Id 183

Question What are the possible features of a text corpus?

1. Boolean feature – presence of word in a document 2. Vector notation of word 3. Part of Speech Tag 4. Basic Dependency Grammar 5. Entire document as a feature

A 1,2 B 1,2,3 C 1,2,3,4,5 D 1,2,5 Marks 1.5 Unit 3

Id 184

Question Google Search’s feature – “Did you mean”, is a mixture of different techniques. Which of the following techniques are likely to be ingredients? Collaborative Filtering model to detect similar user behaviors (queries) 1. Model that checks for among the dictionary terms 2. Translation of sentences into multiple languages 3. Translation of sentences into multiple languages A 1,2 B 1 C 1,2,3 D None of the above Marks 1.5 Unit 3

Id 185

Question In linguistic morphology, ______is the process for reducing inflected words to their root form. A Rooting B Stemming C Text-Proofing D Both Rooting & Stemming Marks 1.5 Unit 3

Id 186

Question Speech Segmentation is a subtask of Speech Recognition. A True B False Marks 1.5 Unit 3

Id 187

Question IR (information Retrieval) and IE () are the two same things. A True B false Marks 1.5 Unit 3

Id 188

Question OCR (Optical Character Recognition) uses NLP. A True B false Marks 1.5 Unit 3

Id 189

Question In the sentence “car engine is excellent but speed is worst” what are the aspects and opinions? A Aspects: Car engine,speed opinions:excellent,worst B Aspects: Car ,speed opinions:excellent C Aspects: Car engine opinions:excellent,worst D Both b and C Marks 1.5 Unit 3

Id 190

Question ______is used to build parse tree. A Part of speech tagging B words C syntax D phrases Marks 1.5 Unit 3

Id 191

Question Google map uses______method for quick understanding. A keywords B search C Part of speech tag(pos) D Naïve byes Marks 1.5 Unit 3

Id 192

Question ______systems that perform sentient analysis based on a set of manually crafted rules. A Hybrid B Rule-based C Automatic D Both A and C Marks 1.5 Unit 3

Id 193

Question ______systems that rely on machine learning techniques to learn from data. A Automatic B hybrid C Rule based D None of the above Marks 1.5 Unit 3

Id 194

Question The typical architecture for an ______system begins by segmenting, tokenizing, and part-of-speech tagging the text. A Information extraction B Information classification C Information retrieval D All of the above Marks 1.5 Unit 3

Id 195

Question ______systems that rely on machine learning techniques as well as manually design rules to learn from data. A Hybrid B Rule based C Machine translation D Automatic Marks 1.5 Unit 3

Id 196

Question In the sentence “Kurta fabric is good, fitting is also perfect and affordable in price. Which type of opinion polarity present in given sentence? A Negative B Positive C Neutral D All of the above Marks 1.5 Unit 3

Id 197

Question Whenever we need to make a ______, we often seek out the _____of others. A Opinion,determine B Opinion, attitude C Decision,opinion D All of the above Marks 1.5 Unit 3

Id 198

Question ______is simply the number of times a given document appears in that file.

A Term frequency B Document frequency C Inverse term frequency D All of the above Marks 1.5 Unit 3

Id 199

Question In the sentence, “I would like to go Himalayas for meditation”, total number of word tokens are______. A 6 B 7 C 8 D 9 Marks 1.5 Unit 3