www.YoYoBrain.com - Accelerators for Memory and Learning Questions for Natural Processing

Category: Default - (33 questions) NLTK: function to recognize named entities nltk.ne_chunk() like person, place, etc NLTK: break into parts of speech nltk.pos_tag() NLTK: theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text NLTK: class to perform PorterStemmer NLTK: how to tokenize strings nltk.tokenize.word_tokenize() NLTK: class for lemmatizing WordNextLemmatizer NLTK: how to tokenize sentences nltk.tokenize.sent_tokenizer() NLTK - tokenize a string or document based regexp_tokenize() on a regular expression NLTK - special class for tweet tokenization TweetTokenizer NLTK - class that has stop words nltk.corpus.stopwords NLP - library popular open-source NLP library Gensim: how to create a gensim dictionary from gensim.corpora.dictionary import Dictionary dictionary = Dictionary(tokenized_docs) Gensim: how to create a corpus with corpus = [dictionary.doc2bow(doc) for doc in dictionary and tokenized_docs tokenized_docs] Gensim: dictionary attribute returns ids from dictionary.token2id the text Gensim: how to get the id for "computer" dictionary.token2id.get("computer") from dictionary Gensim: how to fit a Tf-idf using corpus from gensim.models.tfidfmodel import TfidfModel tfidf = TfidfModel(corpus) NLP - SpaCy library NLP library with focus on creating NLP pipelines to generate models and corpora SpaCy: how to load pre-trained English import model nlp - spacy.load('en') SpaCy: how to load text into spacy doc = nlp(text) model nlp  SpaCy: how to see the entities from loaded doc.ents document doc=nlp(text) NLP: library to do multilingual polyglot recognition Polyglot: how to do named entity recognition from polyglot.text import Text ptext = Text(text) NLP: Flesch reading ease rank of ease of English text for reading - higher score means easier to read Depends on:- average sentence length- number of syllables in a word NLP: Gunning fog index score of readability of English - lower is easier to read - average sentence length- percentage of complex words (3 or more syllables) NLP: library for calculating different reading textatistic difficulty scores NLP: score text reading level using textatistic from textatistic import library Textatisticreadability_scores = Textatistic(text).scores SpaCy: how to generate a list of tokens from tokens = [token.text for token in doc] tokenized doc SpaCy: generate a list of lemmas from lemmas = [token.lemma_ for the token in tokenized doc doc] SpaCy: list of stopwords spacy.en.stop_words.STOP_WORDS SpaCy: how to generate a list of tokenized pos = [ (token.text, token.pos_) for token in and POS tags from tokenized doc doc] SpaCy: load a list of text and named entities ne = [(ent.text, ent.label_) for ent in tuples from tokenized doc documents] SpaCy: how to generate a list of = [token.vector for token in doc] embeddings from tokenized doc SpaCy: how to calculate word embedding token1.similarity(token2) similarity between 2 tokens token1 and doc1.similarity(doc2) token2 or 2 docs doc1 and doc2