<<

Unsupervised Learning

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 1 Clustering

Clustering  In clustering, the target feature is not given. ⊲ Clustering EM  k- Goal: Construct a natural classification that Procedure Example Data can be used to predict features of the data. Random Assignment  The examples are partitioned in into clusters Assign 1 Assign 2 or classes. Properties Soft k-means  Each class predicts values of the features for Example Properties the examples in the class.  In hard clustering, each example is placed Learning Bayesian Networks definitively in a class.  In soft clustering, each example has a probability of belonging to each class.  The best clustering minimizes an error measure.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 2 EM

Clustering  Clustering The EM (Expectation Maximization) algorithm ⊲ EM k-Means is not an algorithm, but is an algorithm design Procedure Example Data technique. Random Assignment  Assign 1 Start with a hypothesis space for classifying Assign 2 Properties the data and a random hypothesis. Soft k-means  Example Repeat until convergence: Properties Reinforcement – E Step. Classify the examples using the Learning Learning Bayesian current hypothesis. Networks – M Step. Learn a new hypothesis from the examples using their current classification.  This can get stuck in local optima; different initializations can affect the result.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 3 k-Means Algorithm

Clustering  Clustering The k-means algorithm is used for hard EM ⊲ k-Means clustering. Procedure  Example Data Inputs: Random Assignment Assign 1 – training examples Assign 2 Properties – the number of classes/clusters, k Soft k-means Example  Properties Outputs: Reinforcement Learning – Each example is assigned to one class. Learning Bayesian Networks – The average/ example of each class.

 If example e = (x1,...,xn) is assigned to class i with mean ui = (ui1,...,uin), error is 2 n 2 ke − uik = Pj=1(xj − uij)

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 4 k-Means Procedure

Clustering Clustering EM Procedure K-Means(E,k) k-Means Inputs: set of examples and number of classes ⊲ Procedure Example Data Random Assignment Randomly assign each example to a class Assign 1 Assign 2 Let Ei be the examples in class i Properties Repeat Soft k-means Example M-Step: Properties Reinforcement for each class i from 1 to k Learning Learning Bayesian u[i] ← Pe∈Ei e/|Ei| Networks E-Step: for each example e in E 2 put e in class arg mini ku[i] − ek until no changes in any Ei return u and Ei clusters

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 5 Example Data

Clustering Clustering EM k-Means Procedure ⊲ Example Data Random Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 6 Random Assignment to Classes

Clustering Clustering EM k-Means Procedure Example Data Random ⊲ Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 7 Assign to Closest Mean

Clustering Clustering EM k-Means Procedure Example Data Random Assignment ⊲ Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 8 Assign to Closest Mean Again

Clustering Clustering EM k-Means Procedure Example Data Random Assignment Assign 1 ⊲ Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 9 Properties of k-Means

Clustering  Clustering An assignment of examples to classes is stable EM k-Means if running both the M step and the E step Procedure Example Data does not change the assignment. Random Assignment  Assign 1 This algorithm will converge to a stable local Assign 2 minimum. ⊲ Properties Soft k-means  It is not guaranteed to converge to a global Example Properties minimum. Reinforcement Learning  It is sensitive to the relative scale of the Learning Bayesian Networks dimensions.  Increasing k can always decrease error until k is the number of different examples.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 10 Soft k-Means

Clustering  Clustering To illustrate soft clustering, consider a “soft” EM k-Means k-means algorithm. Procedure  Example Data E-Step: For each example e, calculate Random Assignment Assign 1 probability distribution P (class i | e) Assign 2 Properties 2 ⊲ Soft k-means P (ci | e) ∝ exp{−kui − ek } Example Properties Reinforcement  Learning M-Step: For each class i, determine mean Learning Bayesian probabilistically. Networks

Pe∈E P (ci | e) ∗ e ui = Pe∈E P (ci | e)

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 11 Soft k-Means Example

Clustering Clustering e P0(Cx | e) P1(Cx | e) P2(Cx | e) EM (0.7, 5.1) 0.0 0.013 0.0 k-Means Procedure (1.5, 6.0) 1.0 0.764 0.0 Example Data Random Assignment (2.1, 4.5) 1.0 0.004 0.0 Assign 1 (2.4, 5.5) 0.0 0.453 0.0 Assign 2 Properties (3.0, 4.4) 0.0 0.007 0.0 Soft k-means (3.5, 5.0) 1.0 0.215 0.0 ⊲ Example Properties (4.5, 1.5) 0.0 0.000 0.0 Reinforcement Learning (5.2, 0.7) 0.0 0.000 0.0 Learning Bayesian (5.3, 1.8) 0.0 0.000 0.0 Networks (6.2, 1.7) 0.0 0.000 0.0 (6.7, 2.5) 1.0 0.000 0.0 (8.5, 9.2) 1.0 1.000 1.0 (9.1, 9.7) 1.0 1.000 1.0 (9.5, 8.5) 0.0 1.000 1.0

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 12 Properties of Soft Clustering

Clustering  Clustering Soft clustering often uses a parameterized EM k-Means probability model, e.g., means and standard Procedure Example Data deviations for normal distribution. Random Assignment  Assign 1 Initially, assign random probabilities to the Assign 2 Properties examples: prob. of class i given example e. Soft k-means  Example The M-step updates the values of the ⊲ Properties parameters from the probabilities. Reinforcement Learning  The E-step updates the probabilities of the Learning Bayesian Networks examples from the probability model.  Does not guarantee global minimum.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 13 Reinforcement Learning

Clustering What should an agent do given: Reinforcement Learning  ⊲ Introduction Prior knowledge: possible states of the world Why hard? Temporal possible actions Differences Example  Q Review Observations: current state of world Q-Learning Update immediate reward/punishment Robot Q-Learner Problems  SARSA Goal: act to maximize accumulated reward SARSA on Cliff  Features We assume there is a sequence of experiences: Learning Bayesian Networks state, action, reward, state, action, reward, ...  At any time agent must decide whether to explore to gain more knowledge, or exploit knowledge it has already discovered

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 14 Why is reinforcement learning hard?

Clustering  What actions are responsible for a reward may Reinforcement Learning Introduction have occurred a long time before the reward ⊲ Why hard? Temporal was received. Differences  Example The long-term effect of an action depends on Q Review Q-Learning what the agent will do in the future. Update  Robot Q-Learner The explore-exploit dilemma: at each time Problems SARSA should the agent be greedy or inquisitive? SARSA on Cliff Features – The ǫ-greedy strategy is to select what Learning Bayesian Networks looks like the best action 1 − ǫ of the time, and to select a random action ǫ of the time.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 15 Temporal Differences

Clustering  Suppose we have a sequence of values Reinforcement Learning 1 2 3 Introduction v , v , v ,... Why hard?  Temporal Estimating the average with the first k values: ⊲ Differences Example v1 + · · · + vk Q Review Ak = Q-Learning k Update Robot Q-Learner  Separating out v : Problems k SARSA SARSA on Cliff Ak = (v1 + · · · + vk−1)/k + vk/k Features Learning Bayesian Networks  Let α = 1/k, then

Ak = (1−α)Ak−1 +αvk = Ak−1 +α(vk −Ak−1)  The TD update is: A ← A + α(v − A)

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 16 Reinforcement Learning Example

Clustering Suppose a robot in this Reinforcement Learning Introduction environment. +1 Why hard? Temporal Differences One terminal square has +1 ⊲ Example −1 Q Review reward (recharge station). Q-Learning Update One terminal square has −1 Robot Q-Learner Problems reward (falling down stairs). SARSA SARSA on Cliff Features An action to stay put always succeeds. Learning Bayesian Networks An action to move to a neighbor square, succeeds with probability 0.8, stays in the same square with prob. 0.1, goes to another neighbor with prob. 0.1 Should the robot try moving left or right?

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 17 Review of Q Values

Clustering  A policy is a function from states to actions. Reinforcement Learning  1 2 discounted Introduction For reward sequence r ,r ,..., Why hard? ∞ i−1 Temporal reward is: V = Pi=1 γ ri (discount = γ) Differences  Example V (s) is expected value of state s. ⊲ Q Review  Q-Learning Q(s, a) is value of action a from s. Update  Robot Q-Learner For optimal policy: Problems SARSA V (s) = maxa Q(s, a) (value of best action) SARSA on Cliff Features ′ ′ ′ Learning Bayesian Q(s, a)=Ps′ P (s |s, a)(R(s, a, s )+γV (s )) = Networks ′ ′ ′ ′ ′ Ps′ P (s |s, a)(R(s, a, s )+γ maxa Q(s ,a ))  Learn optimal policy by learning Q values. Use each experience s, a, r, s′ to update Q[s, a].

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 18 Q-Learning

Clustering Reinforcement Procedure Q-learning(S,A,γ,α,ǫ) Learning Introduction Inputs: states, actions Why hard? Temporal Differences discount, step size, exploration factor Example Q Review Initialize Q[S,A] to zeros ⊲ Q-Learning Repeat for multiple episodes: Update Robot Q-Learner s ← initial state Problems SARSA Repeat until end of episode: SARSA on Cliff Features Select action a using ǫ-greedy strategy Learning Bayesian ′ Networks Do action a. Observe reward r and state s Q[s, a] ← Q[s, a] + ′ ′ α (r + γ maxa′ Q[s ,a ] − Q[s, a]) s ← s′ return Q

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 19 Q-Learning Update

Clustering Unpacking the Q-Learning update: Reinforcement Learning Introduction Q[s, a] ← Q[s, a] + Why hard? Temporal ′ ′ ′ Differences α (r + γ maxa Q[s ,a ] − Q[s, a]) Example Q Review Q-Learning  Q[s, a]: Value of doing action a from state s. ⊲ Update ′ Robot Q-Learner  r, s : reward received and next state Problems SARSA  α, γ: and discount factor SARSA on Cliff ′ ′ Features  maxa′ Q[s ,a ]: With current Q values, the Learning Bayesian ′ Networks value of the optimal action from state s . ′ ′  r + γ maxa′ Q[s ,a ]: With current Q values, the discounted reward to be averaged in.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 20 Robot Q-Learner, γ =0.9, α=ǫ=0.1

Clustering Reinforcement Learning Introduction Why hard? Temporal .76 .74.84 .73 .95 +1 Differences .57 Example .58 .72 .68 .82 .76 .96 Q Review Q-Learning .66 .74 Update .62 .73 ⊲ Robot Q-Learner Problems SARSA .51 .00 −.52 −1 SARSA on Cliff −.66 Features .51 .51 Learning Bayesian Networks .54 .22 −.60 .54 .62 −.71 .42 .49 .37.42 .00 −.01 .50 .49 .55 .50 .39 .38

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 21 Problems with Q-Learning

 Clustering Q-learning does off-policy learning: learns Reinforcement Learning value of optimal policy, but does not follow it. Introduction  Why hard? This is bad if exploration is dangerous. Temporal Differences Below, Q-learning walks off the cliff too much. Example Q Review Q-Learning Update Robot Q-Learner ⊲ Problems SARSA SARSA on Cliff Features Learning Bayesian Networks  On-policy learning learns the value of the policy being followed.  SARSA uses the experience s, a, r, s′,a′ to update Q[s, a].

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 22 SARSA Procedure

Clustering Reinforcement Procedure SARSA(S,A,γ,α,ǫ) Learning Introduction Initialize Q[S,A] to zeros Why hard? Temporal Differences Repeat for multiple episodes: Example Q Review s ← initial state Q-Learning Update Select action a using ǫ-greedy strategy Robot Q-Learner Problems Repeat until end of episode: ⊲ SARSA ′ SARSA on Cliff Do action a. Observe reward r and state s Features Select action a′ for s′ using ǫ-greedy Learning Bayesian Networks Q[s, a] ← Q[s, a] + α (r + γQ[s′,a′] − Q[s, a]) s ← s′ and a ← a′ return Q

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 23 SARSA on the Cliff

Clustering Reinforcement Learning 0 Introduction Why hard? Temporal Differences -20 Example Q Review Q-Learning -40 Update Robot Q-Learner Problems SARSA -60 ⊲ SARSA on Cliff Features Learning Bayesian Reward per Episode -80 Networks Q learning Sarsa -100 0 200 400 600 800 1000 Number of Episodes

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 24 Reinforcement Learning with Features

Clustering  Often, we want to reason in terms of features. Reinforcement Learning  Introduction Want to take of advantage of similarities Why hard? Temporal between states. Differences  Example Each assignment to the features is a state. Q Review  Q-Learning Idea: Express Q as a function of the features. Update Robot Q-Learner Features encode state and action. Problems SARSA 1 2 3 SARSA on Cliff (s, a) = (x , x , x ,...) ⊲ Features Learning Bayesian Q(s, a) = w0 + w1x1 + w2x2 + w3x3 + ... Networks δ = r + γQ(s′,a′) − Q(s, a)

wi ← wi + αδxi

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 25 Learning Bayesian Networks

Clustering Learn each probability separately, if you: Reinforcement Learning  Learning Bayesian know the structure of the network Networks  ⊲ Introduction observe all the variables Learning CPs Unobserved  have many examples Variables Network Structure  have no missing data Algorithm I Algorithm II Original Structure Data Probababilities 1000 Examples Learn Probabilities EM for Hidden Learn Structure

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 26 Learning Conditional Probabilities

Clustering  Use counts for each conditional probability. Reinforcement Learning For example: Learning Bayesian Networks Introduction P (E = t | A = t ∧ B = f) = ⊲ Learning CPs Unobserved Variables count(E = t ∧ A = t ∧ B = f) + c1 Network Structure Algorithm I count(A = t ∧ B = f) + c Algorithm II Original 1 1 1000 Examples c and c is prior (expert) knowledge (c ≤ c). Learn Probabilities  EM for Hidden When there are few examples or many parents Learn Structure to a node, there might be little data for probability estimates: – Use or noisy ORs/ANDs.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 27 Unobserved Variables

Clustering  What if we had no observations of E? Reinforcement Learning  Use EM algorithm with probabilistic inference. Learning Bayesian Networks Introduction – Randomly assign values to probability Learning CPs Unobserved tables that include E. ⊲ Variables Network Structure – Repeat: Algorithm I Algorithm II – E-step: Calculate P (E | e) for each Original 1000 Examples example e. Learn Probabilities EM for Hidden – M-step: Update probabability tables using Learn Structure counts of P (E | e).

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 28 Learning Structures

Clustering Reinforcement P (D | M) ∗ P (M) Learning P (M | D) = Learning Bayesian P (D) Networks Introduction Learning CPs Unobserved Variables Network log P (M | D) ∝ log P (D | M) + log P (M) ⊲ Structure Algorithm I Algorithm II Original  1000 Examples M is a Bayesian network and D is the data. Learn Probabilities  EM for Hidden Assume all variables are observed. Learn Structure  A bigger network can have higher P (M | D).  P (M) can help control the size. (e.g., using the description length).  You can search over network structure looking for the most likely model.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 29 Algorithm I

Clustering  Search over total orderings of variables. Reinforcement Learning  For each total ordering X1,...,Xn use Learning Bayesian Networks supervised learning to learn P (Xi|X1 ...Xi 1). Introduction − Learning CPs Unobserved Variables  Network Structure Return the network model found with ⊲ Algorithm I Algorithm II minimum: − log P (D | M) − log P (M) Original 1000 Examples – log P (D | M) can be obtained by Learn Probabilities EM for Hidden calculation. Learn Structure – Can approximate log P (M) ≈−m log(d + 1), where m = # of parameters in M and d = # of examples.

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 30 Algorithm II

Clustering  Learn a tree-structured Bayesian network. Reinforcement Learning  Compute correlations between all pairs of Learning Bayesian Networks variables. Introduction Learning CPs  Do maximum spanning tree maximizing Unobserved Variables Network Structure absolute values of correlations. Algorithm I  Pick a variable to be root of the tree, then fill ⊲ Algorithm II Original in probabilities using data. 1000 Examples Learn Probabilities EM for Hidden Learn Structure

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 31 Example: Original Bayesian Network

Clustering Reinforcement P(A) = 0.8 A A P(C | A) Learning Learning Bayesian t 0.2 Networks Introduction f 0.5 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II ⊲ Original A P(B | A) BC P(D | B,C) 1000 Examples Learn Probabilities t 0.5 t t 0.5 EM for Hidden Learn Structure f 0.2 D t f 0.8 f t 0.2 f f 0.4

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 32 Example: 1000 Examples

Clustering Reinforcement ABCD Count ABCD Count Learning TTT T 39 FT T T 6 Learning Bayesian Networks Introduction TTTF 37 FTTF 5 Learning CPs Unobserved TTF T 300 FTF T 20 Variables Network Structure TTFF 69 FTFF 10 Algorithm I Algorithm II TFT T 15 FFT T 15 Original TFTF 67 FFTF 46 ⊲ 1000 Examples Learn Probabilities EM for Hidden TFF T 117 FFF T 27 Learn Structure TFFF 189 FFFF 38

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 33 Example: Learn Probabilities

Clustering Reinforcement P(A) = 0.83 A A P(C | A) Learning Learning Bayesian t 0.19 Networks Introduction f 0.43 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II Original 1000 Examples A P(B | A) BC P(D | B,C) Learn ⊲ Probabilities t 0.53 t t 0.52 EM for Hidden Learn Structure f 0.25 D t f 0.80 f t 0.21 f f 0.39

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 34 Example: EM for Hidden Variable

Clustering Reinforcement P(H) =0.25 H H P(C | H) Learning Learning Bayesian t 0.02 Networks Introduction f 0.30 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II Original 1000 Examples H P(B | H) BC P(D | B,C) Learn Probabilities t 0.83 t t 0.5 ⊲ EM for Hidden Learn Structure f 0.37 D t f 0.8 f t 0.2 f f 0.4

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 35 Example: Learn Structure

Clustering Reinforcement A P(A) =0.83 Learning Learning Bayesian Networks Introduction A P(B | A) Learning CPs Unobserved t 0.53 Variables Correlation Table Network Structure B f 0.25 Algorithm I B C D Algorithm II Original A .22 −.21 .11 B P(D | B) 1000 Examples Learn Probabilities B −.12 .41 EM for Hidden t 0.75 ⊲ Learn Structure C −.23 D f 0.34 D P(C | D) t 0.14 C f 0.34

CS 3793/5233 Artificial Intelligence Unsupervised Learning – 36