COMP 307 — Lecture 07 Machine Learning 4 DT Learning And
Total Page:16
File Type:pdf, Size:1020Kb
VICTORIA UNIVERSITY OF WELLINGTON COMP307 ML4 (DT/Perceptron): 2 Te Whare Wananga o te Upoko o te Ika a Maui Outline VUW • Decision tree learning: School of Engineering and Computer Science – Numeric attributes COMP 307 — Lecture 07 – Extension for DT learning • Perceptron learning: – Linear threshold unit Machine Learning 4 – Threshold transfer functions DT Learning and Perceptron Learning – Proceptron learning rule/algorithm – Property/Problem of perceptrons Mengjie Zhang (Meng) [email protected] COMP307 ML4 (DT/Perceptron): 3 COMP307 ML4 (DT/Perceptron): 4 Numeric Attributes Numeric Attributes (Continued) • Could split multiway — one brunch for each value Job Saving Family Class – bad idea, no generalisation Saving A true $10K single Approve B true $7K couple Approve C true $16K single Approve D true $25K single Approve E false $12K couple Approve • Could split on a simple comparison Saving > $10K 1 true $4K couple Reject – But what split point? true false 2 false $30K couple Reject 3 true $15K children Reject 4 false $6K single Reject • Could split on a subrange 5 false $8K children Reject $8K < Saving < $15K – But which range? What question goes in the node “Saving”? true false COMP307 ML4 (DT/Perceptron): 5 COMP307 ML4 (DT/Perceptron): 6 Numeric Attributes (Continued) Numeric Attributes to Binary Job Saving Family Class Consider the class boundaries, choose the best split: 1 true $4K couple Reject • 4 false $6K single Reject (Saving < 7): 0.2 imp(0:2) + 0.8 imp(5:3) = 0.188 B true $7K couple Approve • (Saving < 8): 0.3 imp(1:2) + 0.7 imp(4:3) = 0.238 5 false $8K children Reject • (Saving < 10): 0.4 imp(1:3) + 0.6 imp(4:2) = 0.208 A true $10K single Approve E false $12K couple Approve • (Saving < 15): 0.6 imp(3:3) + 0.4 imp(2:2) = 0.250 3 true $15K children Reject • (Saving < 16): 0.7 imp(3:4) + 0.3 imp(2:1) = 0.238 C true $16K single Approve • (Saving < 30): 0.9 imp(5:4) + 0.1 imp(0:1) = 0.222 D true $25K single Approve 2 false $30K couple Reject Which one should we choose? • Don’t need to try each possible split point – Order items and only consider class boundaries! COMP307 ML4 (DT/Perceptron): 7 COMP307 ML4 (DT/Perceptron): 8 Extension for DT Learning Why Perceptrons and Neural Networks • Information theoretical purity measures: • How do human being learn and do classification? – impurity = X [P (class) log(1/P (class))] class • Use brains (and eyes etc.) – Work for multiple classes • However, the machine learning methods discussed so far don’t • Pruning: seem realistic for implementing/simulating in human brains – Shrink the learned decision trees • Can we use our knowledge of brain structures to suggest alter- – Eliminate “overfitting” native learning mechanisms? • Multi-attribute decisions • Using neurons – Non-axis-parallel hyperplanes – Neurons supports parallel distributed processing • Turning learned decision trees to symbolic rules – Many methods weren’t doing so well on hard, no-linear problems • Regression trees – Linear threshold units (simple perceptrons) – Neural networks (multiple layers, more complex structure) – Leaves can be regression functions COMP307 ML4 (DT/Perceptron): 9 COMP307 ML4 (DT/Perceptron): 10 Neuron Perceptron Dendrites • A kind of artificial neuron • The neuron receives impulses (signals) from other neurons Axon • Simplest kind of neural networks via dendrites Nucleus Cell Body • Linear threshold unit Synapse n • The neuron sends impulses to Synapse – Netinput = X xiwi other neurons via the axon i=1 – if Netinput ≥ T then y =1 else y =0 • Synapse: dendrite of one neuron T and synapse of another w 1 X 1 X • 1 Impulses cause neurotransmitters w 2 T Y X W1 (chemicals) to diffuse across the 2 . synapse . X W2 2 . y W • Enhance (excite) or inhibit n w n • Action adjusted (by learning?) X n X n COMP307 ML4 (DT/Perceptron): 11 COMP307 ML4 (DT/Perceptron): 12 Perceptron (Continued) Simplify the Formula • Transfer functions • Replacing the threshold with a “dummy” feature: – Set x0 =1 – Let w0 = −T • Change the rule n if X wixi > T then y =1 else y =0 Threshold function Sigmoid/Logistic function i=1 • Input values (or “features”) may binary (0/1) (usually) or nu- to meric n • if X wixi > 0 then y =1 else y =0 Output: binary (1 or 0) i=0 • Problems: • The weights in the perceptron can be learned an algorithm – How do you learn the weights? – How do you choose the input features? COMP307 ML4 (DT/Perceptron): 13 COMP307 ML4 (DT/Perceptron): 14 Perceptron Learning Algorithm Perceptron Classifier • Initialize weights to (small) random numbers • Consider a perceptron with two inputs, n =2 (~x =(x1,x2)) – Present an example (+ve/1, or -ve/0) • Output is true (1) if w0 + w1x1 + w2x2 ≥ 0 – If perceptron is correct, do nothing Output is false (0) if w0 + w1x1 + w2x2 < 0 • – If -ve example and wrong Thus the line given by w0 + w1x1 + w2x2 =0 i.e. x = −(w /w )x − w /w ∗ (weights on active features are too big/threshold is too 2 1 2 1 0 2 is the separating line between the two regions low) • If there will be a separating plane ∗ Subtract feature vector from weight vector n =3 If n > 3 there will be a separating hyperplane – If +ve example and wrong • A training set is linearly separable if the data points corre- ∗ (weights on active features are too small/threshold is too sponding to the classes can be separated by a line. high) • Perceptron convergence theorem: The perceptron training al- ∗ Add feature vector from weight vector gorithm will converge if and only if the training set is linearly • Repeat the above steps for every input-output pair until separable. y = d for every pattern pair COMP307 ML4 (DT/Perceptron): 15 COMP307 ML4 (DT/Perceptron): 16 Perceptron Learning(Continued) Perceptron Learning(Continued) • What can the perceptron learn? • Another example: “E” vs “not E” • It can learn to discriminate linearly separable categories such as AND and OR. • If we use the pixel values marked in the figure as input features, ? can the perceptron method successfully solve this classification problem? (a) and (2) or (3) xor • Perhaps perceptron is not really useful if it can’t compute some- • XOR is not linearly separable. There is no way to draw a line thing as simple as XOR? to correctly classify all points. • Need better features! • Perceptron cannot learn the XOR function!!! (Proved in 1969 by • Need better network architecture! Minsky and Papert, with controversial result.) • Need better learning algorithm! COMP307 ML4 (DT/Perceptron): 17 COMP307 ML4 (DT/Perceptron): 18 Perceptron Networks Summary • Perceptron (Linear Threshold Unit): w T X 1 • Numeric attributes in decision tree learning n inputs, 1 output 1 w 2 X 2 . • IG purity measure in decision trees . y • Perceptron structure w n X n • Perceptron learning algorithm • Perceptron, perceptron network: n inputs, m outputs • Limitation of perceptron learning • Input units just pass their input activation unchanged to all output arrows • Next lecture: Neural networks • Two layers of nodes, one layer of weights. • Reading: Text book section 20.5 (2nd edition) or section 18.7 • Still need improvement (3rd edition) or Web materials — Multilayer perceptron or Input Output — Feed forward neural networks!!.