Between Custom and Off-the-shelf NLP

Yves Peirsman NLP LANDSCAPE 2

Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models SENTIMENT ANALYSIS 3

“The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral.” SENTIMENT ANALYSIS: APPLICATIONS 4

http://tarsier.monkeylearn.com SENTIMENT ANALYSIS: APPLICATIONS 5

http://varianceexplained.org/r/trump-tweets/ SENTIMENT ANALYSIS: APPLICATIONS 6

http://www.stockfluence.com/ OVERVIEW 7

Off-the-shelf NLP DIY NLP Conclusions NLP LANDSCAPE 8

Cloud APIs: train your own Cloud APIs: models Libraries: pre-trained + train your own models Libraries: models pre-trained models DATA SETS 9

Domain Source Categories Baseline

Movie reviews rottentomatoes.com positive, negative 50.0%

positive (****, *****), Baby products Amazon 81.5% negative (*, **)

positive (****, *****), Android apps Amazon 72.7% negative (*, **)

positive (****, *****), Android apps Amazon, Wikipedia 51.7% negative (*, **), neutral

positive (****, *****), Hotels, restaurants Yelp 70.8% negative (*, **) DATA SETS: EXAMPLES 10

Positive Neutral Negative If you're looking for is a series of It's Starbucks only something scary, demoware role-playing with bad customer this is the first great video games by Jeff Vogel service. Baristas with horror film of the of Spiderweb Software attitude that don't spooky season. available for and know their own Windows-based computers. product. If I'm paying Several are available for $7.00 for a coffee at iPad and Android tablet. least drop the 'tude. 11 1.

OFF-THE-SHELF NLP Is it OK to be lazy? OFF-THE-SHELF MODELS 12

Variables Data OFF-THE-SHELF MODELS: MOVIE REVIEWS 13

Indico 76.8%

IBM AlchemyAPI 73.4%

Stanford CoreNLP 71.9% OFF-THE-SHELF MODELS: BABY PRODUCTS 14

Indico 92.6%

MonkeyLearn 87.5% (Product)

TextBlob Pattern 82.5% OFF-THE-SHELF MODELS: YELP REVIEWS 15

Indico 92.9%

Google 91.0%

IBM AlchemyAPI 90.4% OFF-THE-SHELF MODELS: ANDROID APPS (2-WAY) 16

Indico 90.6%

Google 90.5%

MonkeyLearn 87.1% (Product) OFF-THE-SHELF MODELS: ANDROID APPS (3-WAY) 17

Indico 80.0%

HavenOnDemand 79.1%

Google 77.8% OFF-THE-SHELF MODELS: CONCLUSIONS 18

There is enormous High quality is Comparing available variation between and possible, but not solutions on your within off-the-shelf guaranteed. data is crucial. solutions. 19 2.

DIY MODELS Are you better off building your own custom models? DIY MODELS: PROCESS 20

Data Library Model DIY MODELS: PROCESS 21 DIY MODELS: BABY PRODUCTS 22

DIY SVM 93.4%

Best off-the-shelf 92.6%

DIY Naive Bayes 86.3% DIY MODELS: ANDROID APPS 23

DIY SVM 93.1%

Best off-the-shelf 90.6%

DIY Naive Bayes 90.3% DIY MODELS: YELP REVIEWS 24

DIY SVM 95.6%

Best off-the-shelf 92.9%

DIY Naive Bayes 89.2% DIY MODELS: BABY PRODUCTS 25

DIY MODELS: ANDROID APPS 26 DIY MODELS: CONCLUSIONS 27

You need sufficient DIY models built with DIY models may or relevant data to build sufficient data will may not be worth the a good model. typically outperform effort. off-the-shelf solutions. 28 3.

Conclusions CONCLUSIONS 29

Off-the-shelf DIY no data lots of data little effort more effort

good quality possible, but superior quality not guaranteed full control no control 30

ALL MODELS ARE WRONG BUT SOME ARE

USEFUL - George Box 31 THANKS! 32 Any questions?

You can find me at: » @yvespeirsman » [email protected]