Review: Data Mining Techniques Outline Point Estimation Estimation

Review: Data Mining n Information Retrieval CS 341, Spring 2007 – Similarity measures – Evaluation Metrics : Precision and Recall n Question Answering Lecture 4: Data Mining Techniques (I) n Web Search Engine – An application of IR – Related to web mining © Prentice Hall 2 Data Mining Techniques Outline Point Estimation Goal: Provide an overview of basic data n Point Estimate: estimate a population mining techniques parameter. n May be made by calculating the parameter for a n Statistical sample. – Point Estimation n May be used to predict value for missing data. – Models Based on Summarization n Ex: – Bayes Theorem – R contains 100 employees – 99 have salary information – Hypothesis Testing – Mean salary of these is $50,000 – Regression and Correlation – Use $50,000 as value of remaining employee’s salary. n Similarity Measures Is this a good idea? © Prentice Hall 3 © Prentice Hall 4 Estimation Error Jackknife Estimate n Jackknife Estimate: estimate of n Bias: Difference between expected value and actual value. parameter is obtained by omitting one value from the set of observed values. n Named to describe a “handy and useful n Mean Squared Error (MSE): expected value tool” of the squared difference between the estimate and the actual value: n Used to reduce bias n Property: The Jackknife estimator lowers the bias from the order of 1/n to 1/n2 n Root Mean Square Error (RMSE) © Prentice Hall 5 © Prentice Hall 6 1 Jackknife Estimate Jackknife Estimator: Example 1 n Definition: n Estimate of mean for X={x1, x2, x3,}, n =3, g=3, θ µ – Divide the sample size n into g groups of m=1, = = (x1+ x2+ x3)/3 θ θ θ size m each, so n=mg. (often m=1 and n 1 = (x2 + x3)/2, 2 = (x1 + x3)/2, 1 = (x1 + x2)/2, θ θ θ θ g=n) n _ = ( 1 + 2 + 2)/3 θ θ θ θ θ θ – estimate j by ignoring the jth group. n Q = g -(g-1) _= 3 -(3-1) _= (x1 + x2 + x3)/3 θ θ – _ is the average of j . – The Jackknife estimator is n In this case, the Jackknife Estimator is the θ θ θ » Q = g – (g-1) _. same as the usual estimator. Where θ is an estimator for the parameter theta. © Prentice Hall 7 © Prentice Hall 8 Jackknife Estimator: Example 2 Jackknife Estimator: Example 2(cont’d) n Estimate of variance for X={1, 4, 4}, n =3, g=3, n In general, apply the Jackknife technique m=1, θ = σ2 to the biased estimator σ2 n σ2 = ((1-3)2 +(4-3)2 +(4-3)2 )/3 = 2 to the biased estimator θ 2 2 n 1 = ((4-4) + (4-4) ) /2 = 0, θ θ σ2 Σ 2 n 2 = 2.25 , 3 = 2.25 = (xi – x ) / n θ θ θ θ n _ = ( 1 + 2 + 2)/3 = 1.5 θ θ θ θ θ 2 n Q = g -(g-1) _= 3 -(3-1) _ n then the jackknife estimator is s =3(2)-2(1.5)=3 2 Σ 2 s = (xi – x ) / (n -1) n In this case, the Jackknife Estimator is Which is known to be unbiased for σ2 different from the usual estimator. © Prentice Hall 9 © Prentice Hall 10 Maximum Likelihood MLE Example Estimate (MLE) n Obtain parameter estimates that maximize n Coin toss five times: {H,H,H,H,T} the probability that the sample data occurs for n Assuming a perfect coin with H and T equally the specific model. likely, the likelihood of this sequence is: n Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: n However if the probability of a H is 0.8 then: n Maximize L. © Prentice Hall 11 © Prentice Hall 12 2 Expectation-Maximization MLE Example (cont’d) (EM) n General likelihood formula: n Solves estimation with incomplete data. n Obtain initial estimates for parameters. n Iteratively use estimates for missing data and continue until convergence. n Estimate for p is then 4/5 = 0.8 © Prentice Hall 13 © Prentice Hall 14 EM Example EM Algorithm © Prentice Hall 15 © Prentice Hall 16 Models Based on Summarization Scatter Diagram n Basic concepts to provide an abstraction and summarization of the data as a whole. – Statistical concepts: mean, variance, median, mode, etc. n Visualization: display the structure of the data graphically. – Line graphs, Pie charts, Histograms, Scatter plots, Hierarchical graphs © Prentice Hall 17 © Prentice Hall 18 3 Bayes Theorem Bayes Theorem Example n Credit authorizations (hypotheses): n Posterior Probability: P(h1|xi) h1=authorize purchase, h2 = authorize after n Prior Probability: P(h ) further identification, h3=do not authorize, 1 h = do not authorize but contact police n Bayes Theorem: 4 n A ssign twelve data values for all combinations of credit and income: 1 2 3 4 Excellent x1 x2 x3 x4 Good x5 x6 x7 x8 Bad x9 x10 x11 x12 n From training data: P(h1) = 60%; P(h2)=20%; n Assign probabilities of hypotheses given a P(h3)=10%; P(h4)=10%. data value. © Prentice Hall 19 © Prentice Hall 20 Bayes Example(cont’d) Bayes Example(cont’d) n Training Data: n Calculate P(xi|hj) and P(xi) ID Income Credit Class xi n Ex: P(x |h )=2/6; P(x |h )=1/6; P(x |h )=2/6; 1 4 Excellent h1 x4 7 1 4 1 2 1 P(x |h )=1/6; P(x |h )=0 for all other x . 2 3 Good h1 x7 P(x8|h1)=1/6; P(xi|h1)=0 for all other xi. n 3 2 Excellent h1 x2 Predict the class for x4: 4 3 Good h1 x7 – Calculate P(hj|x4) for all hj. 5 4 Good h1 x8 – Place x4 in class with largest value. 6 2 Excellent h1 x2 – Ex: 7 3 Bad h2 x11 »P(h1|x4)=(P(x4|h1)(P(h1))/P(x4) 8 2 Bad h2 x10 =(1/6)(0.6)/0.1=1. 9 3 Bad h3 x11 »x4 in class h1. 10 1 Bad h4 x9 © Prentice Hall 21 © Prentice Hall 22 Hypothesis Testing Chi-Square Test n One technique to perform hypothesis testing n Find model to explain behavior by n Used to test the association between two creating and then testing a hypothesis observed variable values and determine if a about the data. set of observed values is statistically different. n Exact opposite of usual DM approach. n The chi-squared statistic is defines as: n H0 – Null hypothesis; Hypothesis to be tested. n O – observed value n H1 – Alternative hypothesis n E – Expected value based on hypothesis. © Prentice Hall 23 © Prentice Hall 24 4 Chi-Square Test Regression n Given the average scores of five schools. Determine whether the difference is n Predict future values based on past values statistically significant. n Fitting a set of points to a curve n Ex: n Linear Regression assumes linear – O={50,93,67,78,87} relationship exists. – E=75 – χ2=15.55 and therefore significant y = c0 + c1 x1 + … + cn xn n Examine a chi-squared significance table. – n input variables, (called regressors or predictors) – with a degree of 4 and a significance level of 95%, – One out put variable, called response the critical value is 9.488. Thus the variance between the schools’ scores and the expected – n+1 constants, chosen during the modlong value cannot be associated with pure chance. process to match the input examples © Prentice Hall 25 © Prentice Hall 26 Linear Regression -- with one input value Correlation n Examine the degree to which the values for two variables behave similarly. n Correlation coefficient r: • 1 = perfect correlation • -1 = perfect but opposite correlation • 0 = no correlation © Prentice Hall 27 © Prentice Hall 28 Correlation Similarity Measures n Determine similarity between two objects. n Similarity characteristics: n Where X, Y are means for X and Y respectively. n Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) r = ? n Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) n Alternatively, distance measure measure how r = ? unlike or dissimilar objects are. © Prentice Hall 29 © Prentice Hall 30 5 Similarity Measures Distance Measures n Measure dissimilarity between objects © Prentice Hall 31 © Prentice Hall 32 Next Lecture: n Data Mining techniques (II) – Decision trees, neural networks and genetic algorithms n Reading assignments: Chapter 3 © Prentice Hall 33 6.

Review: Data Mining Techniques Outline Point Estimation Estimation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support