Questions for Natural Language Processing

Questions for Natural Language Processing

www.YoYoBrain.com - Accelerators for Memory and Learning Questions for Natural Language Processing Category: Default - (33 questions) NLTK: function to recognize named entities nltk.ne_chunk() like person, place, etc NLTK: break words into parts of speech nltk.pos_tag() NLTK: latent semantic analysis theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text NLTK: class to perform word stemming PorterStemmer NLTK: how to tokenize strings nltk.tokenize.word_tokenize() NLTK: class for lemmatizing WordNextLemmatizer NLTK: how to tokenize sentences nltk.tokenize.sent_tokenizer() NLTK - tokenize a string or document based regexp_tokenize() on a regular expression NLTK - special class for tweet tokenization TweetTokenizer NLTK - class that has stop words nltk.corpus.stopwords NLP - library gensim popular open-source NLP library Gensim: how to create a gensim dictionary from gensim.corpora.dictionary import Dictionary dictionary = Dictionary(tokenized_docs) Gensim: how to create a corpus with corpus = [dictionary.doc2bow(doc) for doc in dictionary and tokenized_docs tokenized_docs] Gensim: dictionary attribute returns ids from dictionary.token2id the text Gensim: how to get the id for "computer" dictionary.token2id.get("computer") from dictionary Gensim: how to fit a Tf-idf using corpus from gensim.models.tfidfmodel import TfidfModel tfidf = TfidfModel(corpus) NLP - SpaCy library NLP library with focus on creating NLP pipelines to generate models and corpora SpaCy: how to load pre-trained English import spacy model nlp - spacy.load('en') SpaCy: how to load text into spacy doc = nlp(text) model nlp  SpaCy: how to see the entities from loaded doc.ents document doc=nlp(text) NLP: library to do multilingual named entity polyglot recognition Polyglot: how to do named entity recognition from polyglot.text import Text ptext = Text(text) NLP: Flesch reading ease rank of ease of English text for reading - higher score means easier to read Depends on:- average sentence length- number of syllables in a word NLP: Gunning fog index score of readability of English - lower is easier to read - average sentence length- percentage of complex words (3 or more syllables) NLP: library for calculating different reading textatistic difficulty scores NLP: score text reading level using textatistic from textatistic import library Textatisticreadability_scores = Textatistic(text).scores SpaCy: how to generate a list of tokens from tokens = [token.text for token in doc] tokenized doc SpaCy: generate a list of lemmas from lemmas = [token.lemma_ for the token in tokenized doc doc] SpaCy: list of stopwords spacy.en.stop_words.STOP_WORDS SpaCy: how to generate a list of tokenized pos = [ (token.text, token.pos_) for token in and POS tags from tokenized doc doc] SpaCy: load a list of text and named entities ne = [(ent.text, ent.label_) for ent in tuples from tokenized doc documents] SpaCy: how to generate a list of word embedding = [token.vector for token in doc] embeddings from tokenized doc SpaCy: how to calculate word embedding token1.similarity(token2) similarity between 2 tokens token1 and doc1.similarity(doc2) token2 or 2 docs doc1 and doc2.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us