Machine Learning V1.1
Total Page:16
File Type:pdf, Size:1020Kb
An Introduction to Machine Learning v1.1 E. J. Sagra Agenda ● Why is Machine Learning in the News again? ● ArtificiaI Intelligence vs Machine Learning vs Deep Learning ● Artificial Intelligence ● Machine Learning & Data Science ● Machine Learning ● Data ● Machine Learning - By The Steps ● Tasks that Machine Learning solves ○ Classification ○ Cluster Analysis ○ Regression ○ Ranking ○ Generation Agenda (cont...) ● Model Training ○ Supervised Learning ○ Unsupervised Learning ○ Reinforcement Learning ● Reinforcement Learning - Going Deeper ○ Simple Example ○ The Bellman Equation ○ Deterministic vs. Non-Deterministic Search ○ Markov Decision Process (MDP) ○ Living Penalty ● Machine Learning - Decision Trees ● Machine Learning - Augmented Random Search (ARS) Why is Machine Learning In The News Again? Processing capabilities General ● GPU’s etc have reached level where Machine ● Tools / Languages / Automation Learning / Deep Learning practical ● Need for Data Science no longer limited to ● Cloud computing allows even individuals the tech giants capability to create / train complex models on ● Education is behind in creating Data vast data sets Scientists ● Organizing data is hard. Organizations Memory (Hard Drive (now SSD) as well RAM) challenged ● Speed / capacity increasing ● High demand due to lack of qualified talent ● Cost decreasing Data ● Volume of Data ● Access to vast public data sets ArtificiaI Intelligence vs Machine Learning vs Deep Learning Artificial Intelligence is the all-encompassing concept that initially erupted Followed by Machine Learning that thrived later Finally Deep Learning is escalating the advances of Artificial Intelligence to another level Artificial Intelligence Artificial intelligence (AI) is perhaps the most vaguely understood field of data science. The main idea behind building AI is to use pattern recognition and machine learning to build an agent able to think and reason as humans do (or approach this ability). Challenge: The term is so widely used, we haven’t yet agreed on interpreting this I in AI. Intelligence is hard to formalize, and ways to determine it are numerous. Artificial Intelligence For Example: ● In business language, AI can be interpreted as the ability to solve new problems. Effectively, solving new problems is the outcome of perception, generalizing, reasoning, and judging. ● In the public view, AI is usually conceived as the ability of machines to solve problems related to many fields of knowledge. This would make them somewhat similar to humans. This concept of AGI - Artificial General Intelligence remains in the realm of science fiction - not matching the existing state of the art advancements ● Famous systems as AlphaGo, IBM Watson, or Libratus (Texas Hold’em) are representative of the ANI - Artificial Narrow Intelligence. They specialize in one area and can perform tasks based on similar techniques to process data. Scaling from ANI to AGI is the endeavor that data science has yet to achieve Machine Learning: Programs that Alter Themselves Machine learning is a subset of artificial intelligence. That is, all machine learning counts as artificial intelligence, but not all artificial intelligence counts as machine learning. For example, symbolic logic – rules engines, expert systems and knowledge graphs – could all be described as artificial intelligence, and none of them are machine learning. One aspect that separates machine learning from the knowledge graphs and expert systems is its ability to modify itself when exposed to more data; i.e. machine learning is dynamic and does not require human intervention to make certain changes. That makes it less brittle, and less reliant on human experts. Machine Learning & Data Science Machine learning and statistics are part of data science. The word learning in machine learning means that the algorithms depend on some data, used as a training set, to fine-tune some model or algorithm parameters. This encompasses many techniques such as regression, naive Bayes or supervised clustering. But not all techniques fit in this category. For instance, unsupervised clustering - a statistical and data science technique - aims at detecting clusters and cluster structures without any prior knowledge or training set to help the classification algorithm. A human being is needed to label the clusters found. Some techniques are hybrid, such as semi-supervised classification. Some pattern detection or density estimation techniques fit in this category. Machine Learning Machine Learning uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible. Example applications include email filtering, optical character recognition (OCR) and computer vision. **While it seems that data mining and KDD solely address the main problem of data science, machine learning adds business efficiency to it. Machine Learning Machine learning is similar to data mining in that it’s about creating algorithms to extract valuable insights, however it’s heavily focused on continuous use in dynamically changing environments and emphasizes on adjustments, retraining, and updating of algorithms based on previous experiences. The goal of machine learning is to constantly adapt to new data and discover new patterns or rules in it. Sometimes it can be realized without human guidance and explicit reprogramming. Machine Learning “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. “ –Tom Mitchell ”Field of study that gives computers the ability to learn without being explicitly programmed. - Arthur Samuel (1959) Machine Learning - How? The main difference between machine learning and conventionally programmed algorithms is the ability to process data without being explicitly programmed. This actually means that an engineer isn’t required to provide elaborate instructions to a machine on how to treat each type of data record. Instead, a machine defines these rules itself relying on input data. Regardless of a particular machine learning application, the general workflow remains the same and iteratively repeats once the results become dated or need higher accuracy. The core artifact of any machine learning execution is a mathematical model, which describes how an algorithm processes new data after being trained with a subset of historic data. The goal of training is to develop a model capable of formulating a target value (attribute), some unknown value of each data object. While this sounds complicated, it really isn’t. Machine Learning - Example For example, you need to predict whether customers of your eCommerce store will make a purchase or leave. These predictions buy or leave are the target attributes that we are looking for. To train a model in doing this type of predictions you “feed” an algorithm with a dataset that stores different records of customer behaviors and the results (whether customers left or made a purchase). By learning from this historic data a model will be able to make predictions on future data. Data - How Much Do I Need? For most Machine Learning algorithms / approaches, data is the essential ingredient, however how much will I need? No one can really tell you - however the more powerful machine learning algorithms (often referred to as nonlinear algorithms) generally require more data. These algorithms are often more flexible and even nonparametric (they can figure out how many parameters are required to model your problem in addition to the values of those parameters). They are also high-variance, meaning predictions vary based on the specific data used to train them. This added flexibility and power comes at the cost of requiring more training data, often a lot more data. E.g. If a linear algorithm achieves good performance with hundreds of examples per class, you may need thousands of examples per class for a nonlinear algorithm, like random forest, or an artificial neural network. Some nonlinear algorithms like deep learning methods can continue to improve in skill as you give them more data Data - Challenges There maybe a number of reasons that prevent you or make it more challenging to obtain data for your analysis. For example: ● Security and access ● Privacy ● Compliance, ● Anonymized data ● IP protection ● Barriers (physical and virtual) Finally, format and structure of the data needs to be considered. E.g. Reviewing currency rates from the Federal Reserve going back 40 years there will be a discontinuity from 1999 onwards since the euro had replaced most European currencies. In fact, some nonlinear algorithms like deep learning methods can continue to improve in skill as you give them more data. Data - Characteristics Data must be thought of as a building block for information and analytics. It must be collected to answer a question or set of questions.