Unsupervised Learning

Unsupervised Learning CS 3793/5233 Artificial Intelligence Unsupervised Learning – 1 Clustering Clustering In clustering, the target feature is not given. ⊲ Clustering EM k-Means Goal: Construct a natural classification that Procedure Example Data can be used to predict features of the data. Random Assignment The examples are partitioned in into clusters Assign 1 Assign 2 or classes. Properties Soft k-means Each class predicts values of the features for Example Properties the examples in the class. Reinforcement Learning In hard clustering, each example is placed Learning Bayesian Networks definitively in a class. In soft clustering, each example has a probability of belonging to each class. The best clustering minimizes an error measure. CS 3793/5233 Artificial Intelligence Unsupervised Learning – 2 EM Algorithm Clustering Clustering The EM (Expectation Maximization) algorithm ⊲ EM k-Means is not an algorithm, but is an algorithm design Procedure Example Data technique. Random Assignment Assign 1 Start with a hypothesis space for classifying Assign 2 Properties the data and a random hypothesis. Soft k-means Example Repeat until convergence: Properties Reinforcement – E Step. Classify the examples using the Learning Learning Bayesian current hypothesis. Networks – M Step. Learn a new hypothesis from the examples using their current classification. This can get stuck in local optima; different initializations can affect the result. CS 3793/5233 Artificial Intelligence Unsupervised Learning – 3 k-Means Algorithm Clustering Clustering The k-means algorithm is used for hard EM ⊲ k-Means clustering. Procedure Example Data Inputs: Random Assignment Assign 1 – training examples Assign 2 Properties – the number of classes/clusters, k Soft k-means Example Properties Outputs: Reinforcement Learning – Each example is assigned to one class. Learning Bayesian Networks – The average/mean example of each class. If example e = (x1,...,xn) is assigned to class i with mean ui = (ui1,...,uin), error is 2 n 2 ke − uik = Pj=1(xj − uij) CS 3793/5233 Artificial Intelligence Unsupervised Learning – 4 k-Means Procedure Clustering Clustering EM Procedure K-Means(E,k) k-Means Inputs: set of examples and number of classes ⊲ Procedure Example Data Random Assignment Randomly assign each example to a class Assign 1 Assign 2 Let Ei be the examples in class i Properties Repeat Soft k-means Example M-Step: Properties Reinforcement for each class i from 1 to k Learning Learning Bayesian u[i] ← Pe∈Ei e/|Ei| Networks E-Step: for each example e in E 2 put e in class arg mini ku[i] − ek until no changes in any Ei return u and Ei clusters CS 3793/5233 Artificial Intelligence Unsupervised Learning – 5 Example Data Clustering Clustering EM k-Means Procedure ⊲ Example Data Random Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks CS 3793/5233 Artificial Intelligence Unsupervised Learning – 6 Random Assignment to Classes Clustering Clustering EM k-Means Procedure Example Data Random ⊲ Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks CS 3793/5233 Artificial Intelligence Unsupervised Learning – 7 Assign to Closest Mean Clustering Clustering EM k-Means Procedure Example Data Random Assignment ⊲ Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks CS 3793/5233 Artificial Intelligence Unsupervised Learning – 8 Assign to Closest Mean Again Clustering Clustering EM k-Means Procedure Example Data Random Assignment Assign 1 ⊲ Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks CS 3793/5233 Artificial Intelligence Unsupervised Learning – 9 Properties of k-Means Clustering Clustering An assignment of examples to classes is stable EM k-Means if running both the M step and the E step Procedure Example Data does not change the assignment. Random Assignment Assign 1 This algorithm will converge to a stable local Assign 2 minimum. ⊲ Properties Soft k-means It is not guaranteed to converge to a global Example Properties minimum. Reinforcement Learning It is sensitive to the relative scale of the Learning Bayesian Networks dimensions. Increasing k can always decrease error until k is the number of different examples. CS 3793/5233 Artificial Intelligence Unsupervised Learning – 10 Soft k-Means Clustering Clustering To illustrate soft clustering, consider a “soft” EM k-Means k-means algorithm. Procedure Example Data E-Step: For each example e, calculate Random Assignment Assign 1 probability distribution P (class i | e) Assign 2 Properties 2 ⊲ Soft k-means P (ci | e) ∝ exp{−kui − ek } Example Properties Reinforcement Learning M-Step: For each class i, determine mean Learning Bayesian probabilistically. Networks Pe∈E P (ci | e) ∗ e ui = Pe∈E P (ci | e) CS 3793/5233 Artificial Intelligence Unsupervised Learning – 11 Soft k-Means Example Clustering Clustering e P0(Cx | e) P1(Cx | e) P2(Cx | e) EM (0.7, 5.1) 0.0 0.013 0.0 k-Means Procedure (1.5, 6.0) 1.0 0.764 0.0 Example Data Random Assignment (2.1, 4.5) 1.0 0.004 0.0 Assign 1 (2.4, 5.5) 0.0 0.453 0.0 Assign 2 Properties (3.0, 4.4) 0.0 0.007 0.0 Soft k-means (3.5, 5.0) 1.0 0.215 0.0 ⊲ Example Properties (4.5, 1.5) 0.0 0.000 0.0 Reinforcement Learning (5.2, 0.7) 0.0 0.000 0.0 Learning Bayesian (5.3, 1.8) 0.0 0.000 0.0 Networks (6.2, 1.7) 0.0 0.000 0.0 (6.7, 2.5) 1.0 0.000 0.0 (8.5, 9.2) 1.0 1.000 1.0 (9.1, 9.7) 1.0 1.000 1.0 (9.5, 8.5) 0.0 1.000 1.0 CS 3793/5233 Artificial Intelligence Unsupervised Learning – 12 Properties of Soft Clustering Clustering Clustering Soft clustering often uses a parameterized EM k-Means probability model, e.g., means and standard Procedure Example Data deviations for normal distribution. Random Assignment Assign 1 Initially, assign random probabilities to the Assign 2 Properties examples: prob. of class i given example e. Soft k-means Example The M-step updates the values of the ⊲ Properties parameters from the probabilities. Reinforcement Learning The E-step updates the probabilities of the Learning Bayesian Networks examples from the probability model. Does not guarantee global minimum. CS 3793/5233 Artificial Intelligence Unsupervised Learning – 13 Reinforcement Learning Clustering What should an agent do given: Reinforcement Learning ⊲ Introduction Prior knowledge: possible states of the world Why hard? Temporal possible actions Differences Example Q Review Observations: current state of world Q-Learning Update immediate reward/punishment Robot Q-Learner Problems SARSA Goal: act to maximize accumulated reward SARSA on Cliff Features We assume there is a sequence of experiences: Learning Bayesian Networks state, action, reward, state, action, reward, ... At any time agent must decide whether to explore to gain more knowledge, or exploit knowledge it has already discovered CS 3793/5233 Artificial Intelligence Unsupervised Learning – 14 Why is reinforcement learning hard? Clustering What actions are responsible for a reward may Reinforcement Learning Introduction have occurred a long time before the reward ⊲ Why hard? Temporal was received. Differences Example The long-term effect of an action depends on Q Review Q-Learning what the agent will do in the future. Update Robot Q-Learner The explore-exploit dilemma: at each time Problems SARSA should the agent be greedy or inquisitive? SARSA on Cliff Features – The ǫ-greedy strategy is to select what Learning Bayesian Networks looks like the best action 1 − ǫ of the time, and to select a random action ǫ of the time. CS 3793/5233 Artificial Intelligence Unsupervised Learning – 15 Temporal Differences Clustering Suppose we have a sequence of values Reinforcement Learning 1 2 3 Introduction v , v , v ,... Why hard? Temporal Estimating the average with the first k values: ⊲ Differences Example v1 + · · · + vk Q Review Ak = Q-Learning k Update Robot Q-Learner Separating out v : Problems k SARSA SARSA on Cliff Ak = (v1 + · · · + vk−1)/k + vk/k Features Learning Bayesian Networks Let α = 1/k, then Ak = (1−α)Ak−1 +αvk = Ak−1 +α(vk −Ak−1) The TD update is: A ← A + α(v − A) CS 3793/5233 Artificial Intelligence Unsupervised Learning – 16 Reinforcement Learning Example Clustering Suppose a robot in this Reinforcement Learning Introduction environment. +1 Why hard? Temporal Differences One terminal square has +1 ⊲ Example −1 Q Review reward (recharge station). Q-Learning Update One terminal square has −1 Robot Q-Learner Problems reward (falling down stairs). SARSA SARSA on Cliff Features An action to stay put always succeeds. Learning Bayesian Networks An action to move to a neighbor square, succeeds with probability 0.8, stays in the same square with prob. 0.1, goes to another neighbor with prob. 0.1 Should the robot try moving left or right? CS 3793/5233 Artificial Intelligence Unsupervised Learning – 17 Review of Q Values Clustering A policy is a function from states to actions. Reinforcement Learning 1 2 discounted Introduction For reward sequence r ,r ,..., Why hard? ∞ i−1 Temporal reward is: V = Pi=1 γ ri (discount = γ) Differences Example V (s) is expected value of state s. ⊲ Q Review Q-Learning Q(s, a) is value of action a from s. Update Robot Q-Learner For optimal policy: Problems SARSA V (s) = maxa Q(s, a) (value of best action) SARSA on Cliff Features ′ ′ ′ Learning Bayesian Q(s, a)=Ps′ P (s |s, a)(R(s, a, s )+γV (s )) = Networks ′ ′ ′ ′ ′ Ps′ P (s |s, a)(R(s, a, s )+γ maxa Q(s ,a )) Learn optimal policy by learning Q values. Use each experience s, a, r, s′ to update Q[s, a].

Load more