Machine Learning Logistics Book by Ted Dunning & Ellen Friedman © 2018 (O’Reilly)

Machine Learning Logistics Book by Ted Dunning & Ellen Friedman © 2018 (O’Reilly)

Beyond the Algorithm: What Makes Machine Learning Work? Ellen Friedman, PhD 11 June 2018 Berlin Buzzwords #bbuzz © 2018 Ellen Friedman 1 Contact Information Ellen Friedman, PhD Principal Technologist, MapR Technologies Committer Apache Drill & Apache Mahout projects O’Reilly author Email [email protected] [email protected] Twitter @Ellen_Friedman #bbuzz © 2018 Ellen Friedman 2 What makes machine learning work? © 2018 Ellen Friedman 3 + = ? © 2018 Ellen Friedman 4 Data Engineer Ian Downard Had never tried machine learning, but he caught the bug… © 2018 Ellen Friedman 5 Image Recognition: Which Bird Is This? Rhode Island Red Buff Orpington Jay Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Commons https://upload.wikimedia.org/wikipedia/ Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg commons/7/74/ Aphelocoma#/media/ Barred_Plymouth_Rock_Rooster_001.jpg File:WesternScrubJay2.jpg © 2018 Ellen Friedman 6 More to the point… Chicken Chicken Not a Chicken Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Commons https://upload.wikimedia.org/wikipedia/ Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg commons/7/74/ Aphelocoma#/media/ Barred_Plymouth_Rock_Rooster_001.jpg File:WesternScrubJay2.jpg © 2018 Ellen Friedman 7 Domain Knowledge Matters Chicken Chicken Predator Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Image from Wikipedia & used under Creative Commons https://en.wikipedia.org/wiki/ Commons https://upload.wikimedia.org/wikipedia/ Commons https://en.wikipedia.org/wiki/ File:Rhode_Island_Red_cock,_cropped.jpg commons/7/74/ Aphelocoma#/media/ Barred_Plymouth_Rock_Rooster_001.jpg File:WesternScrubJay2.jpg © 2018 Ellen Friedman 8 Tensor Chicken Deep learning project using Gather Label Labeled training training image files Inception v3 model from data data TensorFlow (see blog + @tensorchicken) Run the Deploy Train model model model Update model © 2018 Ellen Friedman 9 Value from ML: what about SLA’s? How long does it take for image recognition model to classify image? • ~30 seconds because just running on a Raspberry Pi How long does it take for Scrub Jay to peck an egg? • < 30 seconds • Oops… © 2018 Ellen Friedman 10 Value from ML: what about action? When image classification indicates a jay in the henhouse, what would you do to chase away the predator? • Not sure yet. That’s a problem… © 2018 Ellen Friedman 11 What makes a difference for impact? Left: labelled as Buff Orpington. Right: This what a Buff Orpington really looks like. But…this error in domain knowledge (wrong name) did not matter for SLAs. Image from Wikipedia & used under Creative Image from Wikipedia used under Creative Commons Commons https://upload.wikimedia.org/wikipedia/ https://en.wikipedia.org/wiki/Orpington_chicken#/media/ commons/7/74/ File:Coq_orpington_fauve.JPG Barred_Plymouth_Rock_Rooster_001.jpg © 2018 Ellen Friedman 12 What lessons can we learn from this toy project? • Image recognition & deep learning are cool • In some cases, building or training a model is simple • Domain knowledge matters (really) • Pay attention to SLAs • For real business value, have a plan of action in response to machine learning insights (producing a report ≠ taking an action) • Software engineers have a role in machine learning (!) © 2018 Ellen Friedman 13 What about real world examples? © 2018 Ellen Friedman 14 Domain Knowledge Matters: Video Recommender • Use clicks as input data: recommender gives poor performance – Model is testing the wrong preferences: how well people liked titles • Use first 30 seconds of viewing as input data: recommender performance is good – Model now tests how well people liked the videos, not just the titles © 2018 Ellen Friedman 15 Domain Knowledge Matters: Detecting Security Attacks Security expert at a bank preserved headers for web site requests © 2018 Ellen Friedman 16 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1GET /photo.jpg HTTP/1.1 Host: www.sometarget.com Host: lh4.googleusercontent.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:44.0) Gecko/20100101 Firefox/44.0 Accept-Encoding: deflate Accept: image/png,image/*;q=0.8,*/*;q=0.5 Accept-Charset: UTF-8 Accept-Language: en-US,en;q=0.5 Accept-Language: fr Accept-Encoding: gzip, deflate, br Cache-Control: no-cache Referer: https://www.google.com Pragma: no-cache Connection: keep-alive Connection: Keep-Alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request © 2018 Ellen Friedman 17 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1GET /photo.jpg HTTP/1.1 Host: www.sometarget.com Host: lh4.googleusercontent.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:44.0) Gecko/20100101 Firefox/44.0 Accept-Encoding: deflate Accept: image/png,image/*;q=0.8,*/*;q=0.5 Accept-Charset: UTF-8 Accept-Language: en-US,en;q=0.5 Accept-Language: fr Accept-Encoding: gzip, deflate, br Cache-Control: no-cache Referer: https://www.google.com Pragma: no-cache Connection: keep-alive Connection: Keep-Alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request © 2018 Ellen Friedman 18 Another Example GET photo.jpg HTTP/1.1 GET cc/borken.json HTTP/1.1 Host: lh4.googleusercontent host: c.qrs.my User-agent: Mozilla/5.0 (Ma user-agent: Mozilla/4.0 (co Accept: image/png,image/* accept: application/json, t Accept-language: en-US,en accept-language: en-US,en Accept-encoding: gzip, defl accept-encoding: gzip, defl Referer: https://www.google referer: none Connection: keep-alive connection: keep-alive If-none-match: "v9” if-none-match: "v9” Cache-control: max-age=0 cache-control: max-age=0 Real request Attacker request © 2018 Ellen Friedman 19 Another Example GET photo.jpg HTTP/1.1 GET cc/borken.json HTTP/1.1 Host: lh4.googleusercontent host: c.qrs.my User-agent: Mozilla/5.0 (Ma user-agent: Mozilla/4.0 (co Accept: image/png,image/* accept: application/json, t Accept-language: en-US,en accept-language: en-US,en Accept-encoding: gzip, defl accept-encoding: gzip, defl Referer: https://www.google referer: none Connection: keep-alive connection: keep-alive If-none-match: "v9” if-none-match: "v9” Cache-control: max-age=0 cache-control: max-age=0 Real request Attacker request © 2018 Ellen Friedman 20 Domain Knowledge Matters: Detecting Security Attacks Security expert at a bank preserved headers for web site requests • Detected anomaly in headers for the attackers vs normal (real) requests • Pattern of behavior for attackers was allowable for headers and it was not predictable: but it was different © 2018 Ellen Friedman 21 Keep data: You don’t know what you’ll need to know later © 2018 Ellen Friedman 22 Big Industry, Big Data, Big Value . All other mages © E. Friedman Image courtesy Mtell used with permission ©WesAbrams © 2018 Ellen Friedman 23 Simple But Valuable: Accounting Audit Targeting Big industrial company with a lot of machinery • Tracking actions & parts – label as repairs, contracted services, delivery of supplies, etc. • Which are to be taxed, expensed or counted as revenues – Mislabelling can cost millions of dollars • Use machine learning to target potential mislabelling for audit review • Relatively simple models (pattern matching; exception detection) deliver a huge business value. © 2018 Ellen Friedman 24 Is it the algorithm? the model? the ML tool? https://mapr.com/blog/tensorflow-mxnet-caffe-h2o-which-ml-best/ © 2018 Ellen Friedman 25 90% of the effort in successful machine learning isn’t the algorithm or the model… It’s the logistics © 2018 Ellen Friedman 26 What Does Streaming Do for You? Surfer on standing wave, Munich Image © 2017 Ellen Friedman © 2018 Ellen Friedman 27 Stream transport supports microservices © 2018 Ellen Friedman 28 At the Heart: Message Transport B Patient Facilities EMR management With the right messaging tool at the heart of stream-1st Real-time analytics architecture you support other Medical test classes of use cases (B & C) results A Insurance Medical tests audit C © 2018 Ellen Friedman 29 Stream Transport that Decouples Producers & Consumers P C Kafka / P C MapR Streams P C Transport Processing Good stream transport is persistent, performant & pervasive! © 2018 Ellen Friedman 30 Streaming Microservices • “Streaming Microservices” by Ted Dunning & Ellen Friedman, chapter in Encyclopedia of Big Data Technologies, Sherif Sakr and Albert Zomaya, editors, © 2018 (Springer International Publishing) • Chapter 3 of Streaming Architecture by Ted Dunning & Ellen Friedman © 2016 (O’Reilly Media) https://mapr.com/ebooks/streaming-architecture/chapter-03-streaming-platform-for- microservices.html © 2018 Ellen Friedman 31 Get rid of the myth of the unitary model © 2018 Ellen Friedman 32 Streaming microservices provides flexibility & independence to manage many models © 2018 Ellen Friedman 33 Logistics for machine learning can be difficult • Just getting the training & input data is hard • Many models to manage • Model-to-model evaluation needs to be convenient & accurate • Respond as the world changes: Deploy to production with agile roll out & roll • There’s a need for good data

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    53 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us