COMP 307 — Lecture 07 Machine Learning 4 DT Learning And

VICTORIA UNIVERSITY OF WELLINGTON COMP307 ML4 (DT/Perceptron): 2 Te Whare Wananga o te Upoko o te Ika a Maui Outline VUW • Decision tree learning: School of Engineering and Computer Science – Numeric attributes COMP 307 — Lecture 07 – Extension for DT learning • Perceptron learning: – Linear threshold unit Machine Learning 4 – Threshold transfer functions DT Learning and Perceptron Learning – Proceptron learning rule/algorithm – Property/Problem of perceptrons Mengjie Zhang (Meng) [email protected] COMP307 ML4 (DT/Perceptron): 3 COMP307 ML4 (DT/Perceptron): 4 Numeric Attributes Numeric Attributes (Continued) • Could split multiway — one brunch for each value Job Saving Family Class – bad idea, no generalisation Saving A true $10K single Approve B true $7K couple Approve C true $16K single Approve D true $25K single Approve E false $12K couple Approve • Could split on a simple comparison Saving > $10K 1 true $4K couple Reject – But what split point? true false 2 false $30K couple Reject 3 true $15K children Reject 4 false $6K single Reject • Could split on a subrange 5 false $8K children Reject $8K < Saving < $15K – But which range? What question goes in the node “Saving”? true false COMP307 ML4 (DT/Perceptron): 5 COMP307 ML4 (DT/Perceptron): 6 Numeric Attributes (Continued) Numeric Attributes to Binary Job Saving Family Class Consider the class boundaries, choose the best split: 1 true $4K couple Reject • 4 false $6K single Reject (Saving < 7): 0.2 imp(0:2) + 0.8 imp(5:3) = 0.188 B true $7K couple Approve • (Saving < 8): 0.3 imp(1:2) + 0.7 imp(4:3) = 0.238 5 false $8K children Reject • (Saving < 10): 0.4 imp(1:3) + 0.6 imp(4:2) = 0.208 A true $10K single Approve E false $12K couple Approve • (Saving < 15): 0.6 imp(3:3) + 0.4 imp(2:2) = 0.250 3 true $15K children Reject • (Saving < 16): 0.7 imp(3:4) + 0.3 imp(2:1) = 0.238 C true $16K single Approve • (Saving < 30): 0.9 imp(5:4) + 0.1 imp(0:1) = 0.222 D true $25K single Approve 2 false $30K couple Reject Which one should we choose? • Don’t need to try each possible split point – Order items and only consider class boundaries! COMP307 ML4 (DT/Perceptron): 7 COMP307 ML4 (DT/Perceptron): 8 Extension for DT Learning Why Perceptrons and Neural Networks • Information theoretical purity measures: • How do human being learn and do classification? – impurity = X [P (class) log(1/P (class))] class • Use brains (and eyes etc.) – Work for multiple classes • However, the machine learning methods discussed so far don’t • Pruning: seem realistic for implementing/simulating in human brains – Shrink the learned decision trees • Can we use our knowledge of brain structures to suggest alter- – Eliminate “overfitting” native learning mechanisms? • Multi-attribute decisions • Using neurons – Non-axis-parallel hyperplanes – Neurons supports parallel distributed processing • Turning learned decision trees to symbolic rules – Many methods weren’t doing so well on hard, no-linear problems • Regression trees – Linear threshold units (simple perceptrons) – Neural networks (multiple layers, more complex structure) – Leaves can be regression functions COMP307 ML4 (DT/Perceptron): 9 COMP307 ML4 (DT/Perceptron): 10 Neuron Perceptron Dendrites • A kind of artificial neuron • The neuron receives impulses (signals) from other neurons Axon • Simplest kind of neural networks via dendrites Nucleus Cell Body • Linear threshold unit Synapse n • The neuron sends impulses to Synapse – Netinput = X xiwi other neurons via the axon i=1 – if Netinput ≥ T then y =1 else y =0 • Synapse: dendrite of one neuron T and synapse of another w 1 X 1 X • 1 Impulses cause neurotransmitters w 2 T Y X W1 (chemicals) to diffuse across the 2 . synapse . X W2 2 . y W • Enhance (excite) or inhibit n w n • Action adjusted (by learning?) X n X n COMP307 ML4 (DT/Perceptron): 11 COMP307 ML4 (DT/Perceptron): 12 Perceptron (Continued) Simplify the Formula • Transfer functions • Replacing the threshold with a “dummy” feature: – Set x0 =1 – Let w0 = −T • Change the rule n if X wixi > T then y =1 else y =0 Threshold function Sigmoid/Logistic function i=1 • Input values (or “features”) may binary (0/1) (usually) or nu- to meric n • if X wixi > 0 then y =1 else y =0 Output: binary (1 or 0) i=0 • Problems: • The weights in the perceptron can be learned an algorithm – How do you learn the weights? – How do you choose the input features? COMP307 ML4 (DT/Perceptron): 13 COMP307 ML4 (DT/Perceptron): 14 Perceptron Learning Algorithm Perceptron Classifier • Initialize weights to (small) random numbers • Consider a perceptron with two inputs, n =2 (~x =(x1,x2)) – Present an example (+ve/1, or -ve/0) • Output is true (1) if w0 + w1x1 + w2x2 ≥ 0 – If perceptron is correct, do nothing Output is false (0) if w0 + w1x1 + w2x2 < 0 • – If -ve example and wrong Thus the line given by w0 + w1x1 + w2x2 =0 i.e. x = −(w /w )x − w /w ∗ (weights on active features are too big/threshold is too 2 1 2 1 0 2 is the separating line between the two regions low) • If there will be a separating plane ∗ Subtract feature vector from weight vector n =3 If n > 3 there will be a separating hyperplane – If +ve example and wrong • A training set is linearly separable if the data points corre- ∗ (weights on active features are too small/threshold is too sponding to the classes can be separated by a line. high) • Perceptron convergence theorem: The perceptron training al- ∗ Add feature vector from weight vector gorithm will converge if and only if the training set is linearly • Repeat the above steps for every input-output pair until separable. y = d for every pattern pair COMP307 ML4 (DT/Perceptron): 15 COMP307 ML4 (DT/Perceptron): 16 Perceptron Learning(Continued) Perceptron Learning(Continued) • What can the perceptron learn? • Another example: “E” vs “not E” • It can learn to discriminate linearly separable categories such as AND and OR. • If we use the pixel values marked in the figure as input features, ? can the perceptron method successfully solve this classification problem? (a) and (2) or (3) xor • Perhaps perceptron is not really useful if it can’t compute some- • XOR is not linearly separable. There is no way to draw a line thing as simple as XOR? to correctly classify all points. • Need better features! • Perceptron cannot learn the XOR function!!! (Proved in 1969 by • Need better network architecture! Minsky and Papert, with controversial result.) • Need better learning algorithm! COMP307 ML4 (DT/Perceptron): 17 COMP307 ML4 (DT/Perceptron): 18 Perceptron Networks Summary • Perceptron (Linear Threshold Unit): w T X 1 • Numeric attributes in decision tree learning n inputs, 1 output 1 w 2 X 2 . • IG purity measure in decision trees . y • Perceptron structure w n X n • Perceptron learning algorithm • Perceptron, perceptron network: n inputs, m outputs • Limitation of perceptron learning • Input units just pass their input activation unchanged to all output arrows • Next lecture: Neural networks • Two layers of nodes, one layer of weights. • Reading: Text book section 20.5 (2nd edition) or section 18.7 • Still need improvement (3rd edition) or Web materials — Multilayer perceptron or Input Output — Feed forward neural networks!!.

COMP 307 — Lecture 07 Machine Learning 4 DT Learning And

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support