Between Custom and Off-The-Shelf NLP Yves Peirsman

Between Custom and Off-The-Shelf NLP Yves Peirsman

Between Custom and Off-the-shelf NLP Yves Peirsman NLP LANDSCAPE 2 Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models SENTIMENT ANALYSIS 3 “The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral.” SENTIMENT ANALYSIS: APPLICATIONS 4 http://tarsier.monkeylearn.com SENTIMENT ANALYSIS: APPLICATIONS 5 http://varianceexplained.org/r/trump-tweets/ SENTIMENT ANALYSIS: APPLICATIONS 6 http://www.stockfluence.com/ OVERVIEW 7 Off-the-shelf NLP DIY NLP Conclusions NLP LANDSCAPE 8 Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models DATA SETS 9 Domain Source Categories Baseline Movie reviews rottentomatoes.com positive, negative 50.0% positive (****, *****), Baby products Amazon 81.5% negative (*, **) positive (****, *****), Android apps Amazon 72.7% negative (*, **) positive (****, *****), Android apps Amazon, Wikipedia 51.7% negative (*, **), neutral positive (****, *****), Hotels, restaurants Yelp 70.8% negative (*, **) DATA SETS: EXAMPLES 10 Positive Neutral Negative If you're looking for Avernum is a series of It's Starbucks only something scary, demoware role-playing with bad customer this is the first great video games by Jeff Vogel service. Baristas with horror film of the of Spiderweb Software attitude that don't spooky season. available for Macintosh and know their own Windows-based computers. product. If I'm paying Several are available for $7.00 for a coffee at iPad and Android tablet. least drop the 'tude. 11 1. OFF-THE-SHELF NLP Is it OK to be lazy? OFF-THE-SHELF MODELS 12 Variables Data OFF-THE-SHELF MODELS: MOVIE REVIEWS 13 Indico 76.8% IBM AlchemyAPI 73.4% Stanford CoreNLP 71.9% OFF-THE-SHELF MODELS: BABY PRODUCTS 14 Indico 92.6% MonkeyLearn 87.5% (Product) TextBlob Pattern 82.5% OFF-THE-SHELF MODELS: YELP REVIEWS 15 Indico 92.9% Google 91.0% IBM AlchemyAPI 90.4% OFF-THE-SHELF MODELS: ANDROID APPS (2-WAY) 16 Indico 90.6% Google 90.5% MonkeyLearn 87.1% (Product) OFF-THE-SHELF MODELS: ANDROID APPS (3-WAY) 17 Indico 80.0% HavenOnDemand 79.1% Google 77.8% OFF-THE-SHELF MODELS: CONCLUSIONS 18 There is enormous High quality is Comparing available variation between and possible, but not solutions on your within off-the-shelf guaranteed. data is crucial. solutions. 19 2. DIY MODELS Are you better off building your own custom models? DIY MODELS: PROCESS 20 Data Library Model DIY MODELS: PROCESS 21 DIY MODELS: BABY PRODUCTS 22 DIY SVM 93.4% Best off-the-shelf 92.6% DIY Naive Bayes 86.3% DIY MODELS: ANDROID APPS 23 DIY SVM 93.1% Best off-the-shelf 90.6% DIY Naive Bayes 90.3% DIY MODELS: YELP REVIEWS 24 DIY SVM 95.6% Best off-the-shelf 92.9% DIY Naive Bayes 89.2% DIY MODELS: BABY PRODUCTS 25 DIY MODELS: ANDROID APPS 26 DIY MODELS: CONCLUSIONS 27 You need sufficient DIY models built with DIY models may or relevant data to build sufficient data will may not be worth the a good model. typically outperform effort. off-the-shelf solutions. 28 3. Conclusions CONCLUSIONS 29 Off-the-shelf DIY no data lots of data little effort more effort good quality possible, but superior quality not guaranteed full control no control 30 ALL MODELS ARE WRONG BUT SOME ARE USEFUL - George Box 31 THANKS! 32 Any questions? You can find me at: » @yvespeirsman » [email protected].

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    32 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us