Between Custom and Off-the-shelf NLP Yves Peirsman NLP LANDSCAPE 2 Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models SENTIMENT ANALYSIS 3 “The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral.” SENTIMENT ANALYSIS: APPLICATIONS 4 http://tarsier.monkeylearn.com SENTIMENT ANALYSIS: APPLICATIONS 5 http://varianceexplained.org/r/trump-tweets/ SENTIMENT ANALYSIS: APPLICATIONS 6 http://www.stockfluence.com/ OVERVIEW 7 Off-the-shelf NLP DIY NLP Conclusions NLP LANDSCAPE 8 Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models DATA SETS 9 Domain Source Categories Baseline Movie reviews rottentomatoes.com positive, negative 50.0% positive (****, *****), Baby products Amazon 81.5% negative (*, **) positive (****, *****), Android apps Amazon 72.7% negative (*, **) positive (****, *****), Android apps Amazon, Wikipedia 51.7% negative (*, **), neutral positive (****, *****), Hotels, restaurants Yelp 70.8% negative (*, **) DATA SETS: EXAMPLES 10 Positive Neutral Negative If you're looking for Avernum is a series of It's Starbucks only something scary, demoware role-playing with bad customer this is the first great video games by Jeff Vogel service. Baristas with horror film of the of Spiderweb Software attitude that don't spooky season. available for Macintosh and know their own Windows-based computers. product. If I'm paying Several are available for $7.00 for a coffee at iPad and Android tablet. least drop the 'tude. 11 1. OFF-THE-SHELF NLP Is it OK to be lazy? OFF-THE-SHELF MODELS 12 Variables Data OFF-THE-SHELF MODELS: MOVIE REVIEWS 13 Indico 76.8% IBM AlchemyAPI 73.4% Stanford CoreNLP 71.9% OFF-THE-SHELF MODELS: BABY PRODUCTS 14 Indico 92.6% MonkeyLearn 87.5% (Product) TextBlob Pattern 82.5% OFF-THE-SHELF MODELS: YELP REVIEWS 15 Indico 92.9% Google 91.0% IBM AlchemyAPI 90.4% OFF-THE-SHELF MODELS: ANDROID APPS (2-WAY) 16 Indico 90.6% Google 90.5% MonkeyLearn 87.1% (Product) OFF-THE-SHELF MODELS: ANDROID APPS (3-WAY) 17 Indico 80.0% HavenOnDemand 79.1% Google 77.8% OFF-THE-SHELF MODELS: CONCLUSIONS 18 There is enormous High quality is Comparing available variation between and possible, but not solutions on your within off-the-shelf guaranteed. data is crucial. solutions. 19 2. DIY MODELS Are you better off building your own custom models? DIY MODELS: PROCESS 20 Data Library Model DIY MODELS: PROCESS 21 DIY MODELS: BABY PRODUCTS 22 DIY SVM 93.4% Best off-the-shelf 92.6% DIY Naive Bayes 86.3% DIY MODELS: ANDROID APPS 23 DIY SVM 93.1% Best off-the-shelf 90.6% DIY Naive Bayes 90.3% DIY MODELS: YELP REVIEWS 24 DIY SVM 95.6% Best off-the-shelf 92.9% DIY Naive Bayes 89.2% DIY MODELS: BABY PRODUCTS 25 DIY MODELS: ANDROID APPS 26 DIY MODELS: CONCLUSIONS 27 You need sufficient DIY models built with DIY models may or relevant data to build sufficient data will may not be worth the a good model. typically outperform effort. off-the-shelf solutions. 28 3. Conclusions CONCLUSIONS 29 Off-the-shelf DIY no data lots of data little effort more effort good quality possible, but superior quality not guaranteed full control no control 30 ALL MODELS ARE WRONG BUT SOME ARE USEFUL - George Box 31 THANKS! 32 Any questions? You can find me at: » @yvespeirsman » [email protected].
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages32 Page
-
File Size-