Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
MACHINE LEARNING Azure Reference Architecture • Solliance Founder, CEO • Author • Microsoft MVP– Microsoft Azure • Azure Elite, Azure Insider Zoiner Tejada [email protected] • CQURE Certified Security Professional @zoinertejada • Google Developer Expert (GDE) AGENDA You will learn: • the key tools in the toolbox (data transformation, supervised learning modules, unsupervised learning modules) • the value that Azure ML brings to the larger solution (such as classification, clustering and predictive analytics) • how you train your model (if you have to at all) and how to validate your model • how Azure ML integrates with your data pipeline © DEVintersection. All rights reserved. http://www.DEVintersection.com INTRO TO DATA SCIENCE Keepin’ it stats-light WHAT IS DATA SCIENCE • Practice of obtaining insights from data • Applies equally to small data and BIG data • Structured and unstructured • Multidisciplinary • Stats • Math • Operations • Signal processing • Linguistics • Database / Storage • Programming • Machine Learning • Scientific Computing © DEVintersection. All rights reserved. http://www.DEVintersection.com WHY NOW? • Data has become a critical asset • With volumes increasing, it’s getting increasingly harder to tease information and insight out the data • Companies with more than 1k employees, store an average of 235 TB of data • 50B connected devices expected by 2020 • Analyst expectations such as those from Gartner say it’s worth it • Organizations that invest in modern data infrastructure will financially outperform their peers by up to 20% • Customers now expect data sophistication • Think “you might also like” on Amazon or Netflix’s recommended movies © DEVintersection. All rights reserved. http://www.DEVintersection.com ANALYTICS SPECTRUM Descriptive Diagnostic Predictive Prescriptive © DEVintersection. All rights reserved. http://www.DEVintersection.com DESCRIPTIVE ANALYTICS • What is happening? • Example • For a retail store, identify the customer segments for marketing purposes © DEVintersection. All rights reserved. http://www.DEVintersection.com DIAGNOSTIC ANALYTICS • Why is it happening? • Example • Understanding what factors are causing customers to leave a service (churn) © DEVintersection. All rights reserved. http://www.DEVintersection.com PREDICTIVE ANALYTICS • What will happen? • Example • Identify customers who are likely to upgrade to the latest phone © DEVintersection. All rights reserved. http://www.DEVintersection.com PRESCRIPTIVE ANALYTICS • What should be done? • Example • What’s the best offer to give to a customer who is likely to want that latest phone © DEVintersection. All rights reserved. http://www.DEVintersection.com PROCESS Monitor Define the model business Develop the performance problem model & tune Acquire and Deploy the prepare data model © DEVintersection. All rights reserved. http://www.DEVintersection.com HOW DO MACHINES LEARN? • The learning process is the same for humans and machine • Divided into three components • Data input – use observation, memory, and recall to provide factual basis for further reasoning • Abstraction – translate the data into broader representations • Generalization – use the abstraction to form a basis for action © DEVintersection. All rights reserved. http://www.DEVintersection.com KEY ML TERMS • Knowledge representation • the formation of logical structures that assist with turning raw data into meaningful insights • Observations/Examples • the raw data inputs, typically thought of as a tuple • Features • An an attribute or column in the example • Model • how the computer summarizes the raw inputs • Training • fitting a particular model to a dataset • Over-fitting • A model that performs well on the training dataset, but poorly when tested with other data © DEVintersection. All rights reserved. http://www.DEVintersection.com COMMON TECHNIQUES • Classification • Clustering • Regression • Simulation • Content Analysis • Recommendation © DEVintersection. All rights reserved. http://www.DEVintersection.com SUPERVISED VS. UNSUPERVISED • Refers to the requirements of the algorithm • Does it need to be “trained” on a set of data before it can provide conclusions? • Supervised algorithms need to be carefully trained before they can be shown other examples and provide results • Unsupervised algorithms do not require training, they provide results given the data at hand © DEVintersection. All rights reserved. http://www.DEVintersection.com CLASSIFICATION ALGORITHMS • Classify people or things into groups • They classify (or predict) a “label” for an example • The outcome is typically known in advance • Tools include • Decision trees • Logistic regression • Neural networks • Supervised learning • Can provide not just the classification, but also how a particular classification was reached © DEVintersection. All rights reserved. http://www.DEVintersection.com CLUSTERING ALGORITHMS • Dividing a set of examples into homogenous groups • While they also can predict a “label” for an example, they are applied when the labels are not known in advance • In other words, you are discovering what groups exist in the data • Tools include • K-means clustering • Unsupervised learning © DEVintersection. All rights reserved. http://www.DEVintersection.com PATTERN DETECTION ALGORITHMS • Identify frequent associations in the data • Tools include • Association rules • Unsupervised learning © DEVintersection. All rights reserved. http://www.DEVintersection.com REGRESSION ALGORITHMS • Predict numerical outcomes • Inputs may be categorical or numerical, but the output is typically a number • Tools include • Linear regression • Neural networks © DEVintersection. All rights reserved. http://www.DEVintersection.com SIMULATION • Model and optimize real world processes • Offers the opportunity to test many scenarios by adjusting model variables • Tools include • Monte Carlo simulations • Markov chain analysis • Linear programming © DEVintersection. All rights reserved. http://www.DEVintersection.com CONTENT ANALYSIS • Surface information and insights from content like text, audio and video • Tools • Pattern recognition • Text mining • Image recognition • OCR © DEVintersection. All rights reserved. http://www.DEVintersection.com RECOMMENDATION • Identify beneficial relationships and recommend items based on similarity between entities or between entities and items • Common example is Amazon’s product recommendations • Tools used • Collaboration filtering (similarity between users or between items) • Content analysis • Affinity (e.g. market basket analysis) © DEVintersection. All rights reserved. http://www.DEVintersection.com ENSEMBLE MODELS • The latest approaches have realized • You can have a set of individually weak algorithms • Use them together to process data • The result can be far superior than even the best lone algorithm • Tools used • Decision Forests (the data is split amongst many decision trees) • Boosted Decision Trees (the data in error is flowed thru a chain of trees) © DEVintersection. All rights reserved. http://www.DEVintersection.com SUMMARY • Defined data science and key machine learning terminology • Described the data science process • Enumerated the types of analytics • Reviewed the many categories of algorithms © DEVintersection. All rights reserved. http://www.DEVintersection.com INTRO TO AZURE MACHINE LEARNING Democratizing machine learning, with the power of the cloud AZURE ML STUDIO • Web based UI for modeling experiments • Typically requires Azure account to design and run GUEST ACCESS • Experiments can be shared outside of having an Azure account • Guest access allows read-only viewing of experiments • Does not allow them to be run © DEVintersection. All rights reserved. http://www.DEVintersection.com EXPERIMENTS • The core “project” type in Azure ML Studio is the experiment • Option for Blank • Numerous templates/samples with which to get started © DEVintersection. All rights reserved. http://www.DEVintersection.com MODULES • Experiments contain modules arranged in a flowchart fashion MODULE HELP • Getting help • Right click a module and select Help to view documentation © DEVintersection. All rights reserved. http://www.DEVintersection.com MODULE COMMENTS • Right-click on module, choose Edit Comment • Add free-form text to document what module accomplishes in the context of the experiment. • You can collapse the comments by clicking on the chevron (up arrow) © DEVintersection. All rights reserved. http://www.DEVintersection.com MODULE CATEGORIES Source Data ML Modules Operationalize© DEVintersection. All rights reserved.Don’t Use Your Models http://www.DEVintersection.com WINE QUALITY PREDICTION • Type: Regression • Candidate Algorithms: • Decision Tree • Data Prep: • None • Business Requirements: • Build a model that takes various characteristics of wine and predicts the quality score deemed by experts © DEVintersection. All rights reserved. http://www.DEVintersection.com DEMO Tour of Azure ML Studio – A first experiment in Wine Quality DATASET • Data saved to your Azure ML workspace is saved in a dataset • A Dataset is data that has been uploaded to Azure Machine Learning Studio • Datasets are external to your experiment • Azure ML provides ~40 sample datasets © DEVintersection. All rights reserved. http://www.DEVintersection.com DATATABLE • Even if you upload data in another format, or specify a storage format such as CSV, ARFF, or TSV, the data is implicitly converted to a DataTable object whenever used by a module in an experiment. •