Unsupervised Learning
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 1 Clustering
Clustering In clustering, the target feature is not given. ⊲ Clustering EM k-Means Goal: Construct a natural classification that Procedure Example Data can be used to predict features of the data. Random Assignment The examples are partitioned in into clusters Assign 1 Assign 2 or classes. Properties Soft k-means Each class predicts values of the features for Example Properties the examples in the class. Reinforcement Learning In hard clustering, each example is placed Learning Bayesian Networks definitively in a class. In soft clustering, each example has a probability of belonging to each class. The best clustering minimizes an error measure.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 2 EM Algorithm
Clustering Clustering The EM (Expectation Maximization) algorithm ⊲ EM k-Means is not an algorithm, but is an algorithm design Procedure Example Data technique. Random Assignment Assign 1 Start with a hypothesis space for classifying Assign 2 Properties the data and a random hypothesis. Soft k-means Example Repeat until convergence: Properties Reinforcement – E Step. Classify the examples using the Learning Learning Bayesian current hypothesis. Networks – M Step. Learn a new hypothesis from the examples using their current classification. This can get stuck in local optima; different initializations can affect the result.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 3 k-Means Algorithm
Clustering Clustering The k-means algorithm is used for hard EM ⊲ k-Means clustering. Procedure Example Data Inputs: Random Assignment Assign 1 – training examples Assign 2 Properties – the number of classes/clusters, k Soft k-means Example Properties Outputs: Reinforcement Learning – Each example is assigned to one class. Learning Bayesian Networks – The average/mean example of each class.
If example e = (x1,...,xn) is assigned to class i with mean ui = (ui1,...,uin), error is 2 n 2 ke − uik = Pj=1(xj − uij)
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 4 k-Means Procedure
Clustering Clustering EM Procedure K-Means(E,k) k-Means Inputs: set of examples and number of classes ⊲ Procedure Example Data Random Assignment Randomly assign each example to a class Assign 1 Assign 2 Let Ei be the examples in class i Properties Repeat Soft k-means Example M-Step: Properties Reinforcement for each class i from 1 to k Learning Learning Bayesian u[i] ← Pe∈Ei e/|Ei| Networks E-Step: for each example e in E 2 put e in class arg mini ku[i] − ek until no changes in any Ei return u and Ei clusters
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 5 Example Data
Clustering Clustering EM k-Means Procedure ⊲ Example Data Random Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 6 Random Assignment to Classes
Clustering Clustering EM k-Means Procedure Example Data Random ⊲ Assignment Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 7 Assign to Closest Mean
Clustering Clustering EM k-Means Procedure Example Data Random Assignment ⊲ Assign 1 Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 8 Assign to Closest Mean Again
Clustering Clustering EM k-Means Procedure Example Data Random Assignment Assign 1 ⊲ Assign 2 Properties Soft k-means Example Properties Reinforcement Learning Learning Bayesian Networks
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 9 Properties of k-Means
Clustering Clustering An assignment of examples to classes is stable EM k-Means if running both the M step and the E step Procedure Example Data does not change the assignment. Random Assignment Assign 1 This algorithm will converge to a stable local Assign 2 minimum. ⊲ Properties Soft k-means It is not guaranteed to converge to a global Example Properties minimum. Reinforcement Learning It is sensitive to the relative scale of the Learning Bayesian Networks dimensions. Increasing k can always decrease error until k is the number of different examples.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 10 Soft k-Means
Clustering Clustering To illustrate soft clustering, consider a “soft” EM k-Means k-means algorithm. Procedure Example Data E-Step: For each example e, calculate Random Assignment Assign 1 probability distribution P (class i | e) Assign 2 Properties 2 ⊲ Soft k-means P (ci | e) ∝ exp{−kui − ek } Example Properties Reinforcement Learning M-Step: For each class i, determine mean Learning Bayesian probabilistically. Networks
Pe∈E P (ci | e) ∗ e ui = Pe∈E P (ci | e)
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 11 Soft k-Means Example
Clustering Clustering e P0(Cx | e) P1(Cx | e) P2(Cx | e) EM (0.7, 5.1) 0.0 0.013 0.0 k-Means Procedure (1.5, 6.0) 1.0 0.764 0.0 Example Data Random Assignment (2.1, 4.5) 1.0 0.004 0.0 Assign 1 (2.4, 5.5) 0.0 0.453 0.0 Assign 2 Properties (3.0, 4.4) 0.0 0.007 0.0 Soft k-means (3.5, 5.0) 1.0 0.215 0.0 ⊲ Example Properties (4.5, 1.5) 0.0 0.000 0.0 Reinforcement Learning (5.2, 0.7) 0.0 0.000 0.0 Learning Bayesian (5.3, 1.8) 0.0 0.000 0.0 Networks (6.2, 1.7) 0.0 0.000 0.0 (6.7, 2.5) 1.0 0.000 0.0 (8.5, 9.2) 1.0 1.000 1.0 (9.1, 9.7) 1.0 1.000 1.0 (9.5, 8.5) 0.0 1.000 1.0
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 12 Properties of Soft Clustering
Clustering Clustering Soft clustering often uses a parameterized EM k-Means probability model, e.g., means and standard Procedure Example Data deviations for normal distribution. Random Assignment Assign 1 Initially, assign random probabilities to the Assign 2 Properties examples: prob. of class i given example e. Soft k-means Example The M-step updates the values of the ⊲ Properties parameters from the probabilities. Reinforcement Learning The E-step updates the probabilities of the Learning Bayesian Networks examples from the probability model. Does not guarantee global minimum.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 13 Reinforcement Learning
Clustering What should an agent do given: Reinforcement Learning ⊲ Introduction Prior knowledge: possible states of the world Why hard? Temporal possible actions Differences Example Q Review Observations: current state of world Q-Learning Update immediate reward/punishment Robot Q-Learner Problems SARSA Goal: act to maximize accumulated reward SARSA on Cliff Features We assume there is a sequence of experiences: Learning Bayesian Networks state, action, reward, state, action, reward, ... At any time agent must decide whether to explore to gain more knowledge, or exploit knowledge it has already discovered
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 14 Why is reinforcement learning hard?
Clustering What actions are responsible for a reward may Reinforcement Learning Introduction have occurred a long time before the reward ⊲ Why hard? Temporal was received. Differences Example The long-term effect of an action depends on Q Review Q-Learning what the agent will do in the future. Update Robot Q-Learner The explore-exploit dilemma: at each time Problems SARSA should the agent be greedy or inquisitive? SARSA on Cliff Features – The ǫ-greedy strategy is to select what Learning Bayesian Networks looks like the best action 1 − ǫ of the time, and to select a random action ǫ of the time.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 15 Temporal Differences
Clustering Suppose we have a sequence of values Reinforcement Learning 1 2 3 Introduction v , v , v ,... Why hard? Temporal Estimating the average with the first k values: ⊲ Differences Example v1 + · · · + vk Q Review Ak = Q-Learning k Update Robot Q-Learner Separating out v : Problems k SARSA SARSA on Cliff Ak = (v1 + · · · + vk−1)/k + vk/k Features Learning Bayesian Networks Let α = 1/k, then
Ak = (1−α)Ak−1 +αvk = Ak−1 +α(vk −Ak−1) The TD update is: A ← A + α(v − A)
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 16 Reinforcement Learning Example
Clustering Suppose a robot in this Reinforcement Learning Introduction environment. +1 Why hard? Temporal Differences One terminal square has +1 ⊲ Example −1 Q Review reward (recharge station). Q-Learning Update One terminal square has −1 Robot Q-Learner Problems reward (falling down stairs). SARSA SARSA on Cliff Features An action to stay put always succeeds. Learning Bayesian Networks An action to move to a neighbor square, succeeds with probability 0.8, stays in the same square with prob. 0.1, goes to another neighbor with prob. 0.1 Should the robot try moving left or right?
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 17 Review of Q Values
Clustering A policy is a function from states to actions. Reinforcement Learning 1 2 discounted Introduction For reward sequence r ,r ,..., Why hard? ∞ i−1 Temporal reward is: V = Pi=1 γ ri (discount = γ) Differences Example V (s) is expected value of state s. ⊲ Q Review Q-Learning Q(s, a) is value of action a from s. Update Robot Q-Learner For optimal policy: Problems SARSA V (s) = maxa Q(s, a) (value of best action) SARSA on Cliff Features ′ ′ ′ Learning Bayesian Q(s, a)=Ps′ P (s |s, a)(R(s, a, s )+γV (s )) = Networks ′ ′ ′ ′ ′ Ps′ P (s |s, a)(R(s, a, s )+γ maxa Q(s ,a )) Learn optimal policy by learning Q values. Use each experience s, a, r, s′ to update Q[s, a].
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 18 Q-Learning
Clustering Reinforcement Procedure Q-learning(S,A,γ,α,ǫ) Learning Introduction Inputs: states, actions Why hard? Temporal Differences discount, step size, exploration factor Example Q Review Initialize Q[S,A] to zeros ⊲ Q-Learning Repeat for multiple episodes: Update Robot Q-Learner s ← initial state Problems SARSA Repeat until end of episode: SARSA on Cliff Features Select action a using ǫ-greedy strategy Learning Bayesian ′ Networks Do action a. Observe reward r and state s Q[s, a] ← Q[s, a] + ′ ′ α (r + γ maxa′ Q[s ,a ] − Q[s, a]) s ← s′ return Q
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 19 Q-Learning Update
Clustering Unpacking the Q-Learning update: Reinforcement Learning Introduction Q[s, a] ← Q[s, a] + Why hard? Temporal ′ ′ ′ Differences α (r + γ maxa Q[s ,a ] − Q[s, a]) Example Q Review Q-Learning Q[s, a]: Value of doing action a from state s. ⊲ Update ′ Robot Q-Learner r, s : reward received and next state Problems SARSA α, γ: learning rate and discount factor SARSA on Cliff ′ ′ Features maxa′ Q[s ,a ]: With current Q values, the Learning Bayesian ′ Networks value of the optimal action from state s . ′ ′ r + γ maxa′ Q[s ,a ]: With current Q values, the discounted reward to be averaged in.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 20 Robot Q-Learner, γ =0.9, α=ǫ=0.1
Clustering Reinforcement Learning Introduction Why hard? Temporal .76 .74.84 .73 .95 +1 Differences .57 Example .58 .72 .68 .82 .76 .96 Q Review Q-Learning .66 .74 Update .62 .73 ⊲ Robot Q-Learner Problems SARSA .51 .00 −.52 −1 SARSA on Cliff −.66 Features .51 .51 Learning Bayesian Networks .54 .22 −.60 .54 .62 −.71 .42 .49 .37.42 .00 −.01 .50 .49 .55 .50 .39 .38
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 21 Problems with Q-Learning
Clustering Q-learning does off-policy learning: learns Reinforcement Learning value of optimal policy, but does not follow it. Introduction Why hard? This is bad if exploration is dangerous. Temporal Differences Below, Q-learning walks off the cliff too much. Example Q Review Q-Learning Update Robot Q-Learner ⊲ Problems SARSA SARSA on Cliff Features Learning Bayesian Networks On-policy learning learns the value of the policy being followed. SARSA uses the experience s, a, r, s′,a′ to update Q[s, a].
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 22 SARSA Procedure
Clustering Reinforcement Procedure SARSA(S,A,γ,α,ǫ) Learning Introduction Initialize Q[S,A] to zeros Why hard? Temporal Differences Repeat for multiple episodes: Example Q Review s ← initial state Q-Learning Update Select action a using ǫ-greedy strategy Robot Q-Learner Problems Repeat until end of episode: ⊲ SARSA ′ SARSA on Cliff Do action a. Observe reward r and state s Features Select action a′ for s′ using ǫ-greedy Learning Bayesian Networks Q[s, a] ← Q[s, a] + α (r + γQ[s′,a′] − Q[s, a]) s ← s′ and a ← a′ return Q
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 23 SARSA on the Cliff
Clustering Reinforcement Learning 0 Introduction Why hard? Temporal Differences -20 Example Q Review Q-Learning -40 Update Robot Q-Learner Problems SARSA -60 ⊲ SARSA on Cliff Features Learning Bayesian Reward per Episode -80 Networks Q learning Sarsa -100 0 200 400 600 800 1000 Number of Episodes
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 24 Reinforcement Learning with Features
Clustering Often, we want to reason in terms of features. Reinforcement Learning Introduction Want to take of advantage of similarities Why hard? Temporal between states. Differences Example Each assignment to the features is a state. Q Review Q-Learning Idea: Express Q as a function of the features. Update Robot Q-Learner Features encode state and action. Problems SARSA 1 2 3 SARSA on Cliff (s, a) = (x , x , x ,...) ⊲ Features Learning Bayesian Q(s, a) = w0 + w1x1 + w2x2 + w3x3 + ... Networks δ = r + γQ(s′,a′) − Q(s, a)
wi ← wi + αδxi
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 25 Learning Bayesian Networks
Clustering Learn each probability separately, if you: Reinforcement Learning Learning Bayesian know the structure of the network Networks ⊲ Introduction observe all the variables Learning CPs Unobserved have many examples Variables Network Structure have no missing data Algorithm I Algorithm II Original Structure Data Probababilities 1000 Examples Learn Probabilities EM for Hidden Learn Structure
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 26 Learning Conditional Probabilities
Clustering Use counts for each conditional probability. Reinforcement Learning For example: Learning Bayesian Networks Introduction P (E = t | A = t ∧ B = f) = ⊲ Learning CPs Unobserved Variables count(E = t ∧ A = t ∧ B = f) + c1 Network Structure Algorithm I count(A = t ∧ B = f) + c Algorithm II Original 1 1 1000 Examples c and c is prior (expert) knowledge (c ≤ c). Learn Probabilities EM for Hidden When there are few examples or many parents Learn Structure to a node, there might be little data for probability estimates: – Use supervised learning or noisy ORs/ANDs.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 27 Unobserved Variables
Clustering What if we had no observations of E? Reinforcement Learning Use EM algorithm with probabilistic inference. Learning Bayesian Networks Introduction – Randomly assign values to probability Learning CPs Unobserved tables that include E. ⊲ Variables Network Structure – Repeat: Algorithm I Algorithm II – E-step: Calculate P (E | e) for each Original 1000 Examples example e. Learn Probabilities EM for Hidden – M-step: Update probabability tables using Learn Structure counts of P (E | e).
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 28 Learning Bayesian Network Structures
Clustering Reinforcement P (D | M) ∗ P (M) Learning P (M | D) = Learning Bayesian P (D) Networks Introduction Learning CPs Unobserved Variables Network log P (M | D) ∝ log P (D | M) + log P (M) ⊲ Structure Algorithm I Algorithm II Original 1000 Examples M is a Bayesian network and D is the data. Learn Probabilities EM for Hidden Assume all variables are observed. Learn Structure A bigger network can have higher P (M | D). P (M) can help control the size. (e.g., using the description length). You can search over network structure looking for the most likely model.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 29 Algorithm I
Clustering Search over total orderings of variables. Reinforcement Learning For each total ordering X1,...,Xn use Learning Bayesian Networks supervised learning to learn P (Xi|X1 ...Xi 1). Introduction − Learning CPs Unobserved Variables Network Structure Return the network model found with ⊲ Algorithm I Algorithm II minimum: − log P (D | M) − log P (M) Original 1000 Examples – log P (D | M) can be obtained by Learn Probabilities EM for Hidden calculation. Learn Structure – Can approximate log P (M) ≈−m log(d + 1), where m = # of parameters in M and d = # of examples.
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 30 Algorithm II
Clustering Learn a tree-structured Bayesian network. Reinforcement Learning Compute correlations between all pairs of Learning Bayesian Networks variables. Introduction Learning CPs Do maximum spanning tree maximizing Unobserved Variables Network Structure absolute values of correlations. Algorithm I Pick a variable to be root of the tree, then fill ⊲ Algorithm II Original in probabilities using data. 1000 Examples Learn Probabilities EM for Hidden Learn Structure
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 31 Example: Original Bayesian Network
Clustering Reinforcement P(A) = 0.8 A A P(C | A) Learning Learning Bayesian t 0.2 Networks Introduction f 0.5 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II ⊲ Original A P(B | A) BC P(D | B,C) 1000 Examples Learn Probabilities t 0.5 t t 0.5 EM for Hidden Learn Structure f 0.2 D t f 0.8 f t 0.2 f f 0.4
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 32 Example: 1000 Examples
Clustering Reinforcement ABCD Count ABCD Count Learning TTT T 39 FT T T 6 Learning Bayesian Networks Introduction TTTF 37 FTTF 5 Learning CPs Unobserved TTF T 300 FTF T 20 Variables Network Structure TTFF 69 FTFF 10 Algorithm I Algorithm II TFT T 15 FFT T 15 Original TFTF 67 FFTF 46 ⊲ 1000 Examples Learn Probabilities EM for Hidden TFF T 117 FFF T 27 Learn Structure TFFF 189 FFFF 38
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 33 Example: Learn Probabilities
Clustering Reinforcement P(A) = 0.83 A A P(C | A) Learning Learning Bayesian t 0.19 Networks Introduction f 0.43 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II Original 1000 Examples A P(B | A) BC P(D | B,C) Learn ⊲ Probabilities t 0.53 t t 0.52 EM for Hidden Learn Structure f 0.25 D t f 0.80 f t 0.21 f f 0.39
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 34 Example: EM for Hidden Variable
Clustering Reinforcement P(H) =0.25 H H P(C | H) Learning Learning Bayesian t 0.02 Networks Introduction f 0.30 Learning CPs Unobserved Variables Network Structure BC Algorithm I Algorithm II Original 1000 Examples H P(B | H) BC P(D | B,C) Learn Probabilities t 0.83 t t 0.5 ⊲ EM for Hidden Learn Structure f 0.37 D t f 0.8 f t 0.2 f f 0.4
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 35 Example: Learn Structure
Clustering Reinforcement A P(A) =0.83 Learning Learning Bayesian Networks Introduction A P(B | A) Learning CPs Unobserved t 0.53 Variables Correlation Table Network Structure B f 0.25 Algorithm I B C D Algorithm II Original A .22 −.21 .11 B P(D | B) 1000 Examples Learn Probabilities B −.12 .41 EM for Hidden t 0.75 ⊲ Learn Structure C −.23 D f 0.34 D P(C | D) t 0.14 C f 0.34
CS 3793/5233 Artificial Intelligence Unsupervised Learning – 36