Scikit-Learn: Classifiers

Scikit-Learn: Classifiers - Binary • Binary classification from sklearn.linear model import SGDClassifier SGDClassifier(loss=hinge, penalty=l2, alpha=0.0001, l1 ratio=0.15, fit intercept=True, max iter=None, tol=None, shuffle=True, verbose=0, epsilon=0.1, n jobs=None, random state=None, learning rate=optimal, eta0=0.0, power t=0.5, early stopping=False, validation fraction=0.1, n iter no change=5, class weight=None, warm start=False, average=False, n iter=None) { This implements linear classifiers (i.e., SVM, logistic regression, a.o.) { For best results using the default learning rate schedule, the data should have zero mean and unit variance { Expects floating point values for the features { Parameters: ∗ loss (string) · The function used to calculate error (e.g., MSE) · Determines model used: hinge: linear SVM log: logistic regression modified huber squared hinge perceptron squared loss epsilon insensitive squared epsilon insensitive ∗ penalty (string) · Type of regularization ∗ alpha (float) · Regularization term multiplier · Affects learning rate when set to 'optimal' ∗ l1 ratio · Elastic Net mixing parameter ∗ fit intercept · Whether the intercept should be estimated or not · If False, the data is assumed to be already centered 1 Scikit-Learn: Classifiers - Binary (2) ∗ max iter: (int) · Max number of passes ∗ tol (float) · Stopping criterion · If it is not None, the iterations will stop when (loss > previousloss− tol) ∗ shuffle · Whether or not the training data should be shuffled after each epoch ∗ epsilon (float) · value in loss functions for huber, epsilon insensitive, or squared epsilon insensitive ∗ eta0 (double) · Initial learning rate ∗ learning rate · Learning rate schedule 'constant': η = eta0 'optimal': η = 1:0=(α(t + t0) where t0 chosen by heuristic 'invscaling': η = eta0=pow(t; power t) 'adaptive': η = eta0 as long as training keeps decreasing ∗ power t (double) ∗ early stopping · If T rue, automatically sets aside a fraction of training data as validation and terminate training when validation score is not improving by at least tol for n iter no change consecutive epochs ∗ validation fraction (float) · Proportion of training data to set aside as validation set for early stopping 2 Scikit-Learn: Classifiers - Binary (3) ∗ n iter no change · Number of iterations with no improvement to wait before early stopping ∗ average · When set to True, computes the averaged SGD weights and stores the result in the coef attribute · If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average ∗ n iter · Number of passes over the training data (deprecated) { Attributes: ∗ coef (array) · Weights assigned to the features ∗ intercept (array) · Constants in decision function ∗ n iter ∗ loss function { Methods (beyond fit() and predict(): ∗ decision function(X) · Predict confidence scores for samples ∗ partial fit(X, y, classes=None, sample weight=None) (array) · Perform one epoch of stochastic gradient descent on given samples ∗ score(X, y, sample weight=None) · Returns the mean accuracy on the given test data and labels ∗ predict proba · Probability estimates · Only available for log loss and modified Huber loss 3 Scikit-Learn: Classifiers - Multiclass and Multilabel 1. Note: All classifiers in scikit-learn do multiclass classification out-of-the-box • Use module sklearn:multiclass if you want to experiment with different multiclass strategies 2. Multiclass classification: • Classification task with more than two classes • Assume that each sample is assigned to one and only one label 3. Multilabel classification: • Each sample assigned a set of target labels • For when labels are not mutually exclusive 4. Multioutput regression: • Each sample assigned a set of target values • For when labels are not mutually exclusive 5. Multioutput-multiclass classification and multi-task classification: • Single estimator has to handle several joint classification tasks • For when labels are not mutually exclusive 6. Classes: (a) One v One multiclass classification from sklearn.multiclass import Onev- sOneClassifier OneVsOneClassifier(estimator, n jobs=None) • Parameters: { Self-evident • Methods: { See above • Attributes: { estimators { classes • See also OnevsRestClassifier multiclass/multilabel classifier 4 Scikit-Learn: Classifiers - Multiclass and Multilabel (2) (b) Multilabel classification from sklearn.neighbors import KNeighborsClassi- fier KNeighborsClassifier((n neighbors=5, weights=uniform, algorithm=auto, leaf size=30, p=2, metric=minkowski, metric params=None, n jobs=None, **kwargs)) • Parameters: { n neighbors (int) ∗ Number of neighbors to use by default for kneighbors queries { weights (string, callable) ∗ Weight function used in prediction 'uniform': All points in each neighborhood are weighted equally 'distance': weight points by the inverse of their distance Closer neighbors have a greater influence than farther ones callable: a user-defined function { algorithm (string) ∗ Algorithm used to compute the nearest neighbors 'ball tree': 'kd tree': 'brute': brute force 'auto': Decides the most appropriate algorithm based on the values passed to fit method { leaf size (int) ∗ Leaf sized passed to the two tree algorithms { p (int) ∗ Power parameter for the Minkowski metric { metric (string, callable) ∗ *** { metric params (dictionary) ∗ Additional keyword arguments for the metric function 5 Scikit-Learn: Classifiers - Multiclass and Multilabel (3) • Methods: { kneighbors(X=None, n neighbors=None, return distance=True) ∗ Finds the K-neighbors of a point ∗ Returns indices of and distances to the neighbors of each point { kneighbors graph(X=None, n neighbors=None, mode=connectivity) ∗ Computes the (weighted) graph of k-Neighbors for points in X ∗ Returns a sparse matrix in CSR format, shape = [n samples, n samples fit] 6 Scikit-Learn: Binary Classification • References Geron C3 • MNIST corpus { Handwriting sets of integers { Already divided into training and test sets { Each image is 28 × 28 pixels ∗ Results in 784 features (one for each pixel) ∗ Values that range from 0 (white) to 255 (black) • Recommended to scramble training set set before using, especially as numbers are listed in order in MNIST • SGDClassifier: See Learning Models in Scikit notes • For more control over cross-validation than Scikit models provide, do it man- ually: (See StratifiedKFold in Cross-Validation in Scikit notes from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=3,random_state=42) for train_index,test_index in skfolds.split(X_train,y_train_5): clone_clf = clone(sgd_clf) X_train_folds = X_train[train_index] y_train_folds = (y_train_5[train_index]) X_test_fold = X_train[test_index] y_test_fold = (y_train_5[test_index]) clone_clf.fit(X_train_folds,y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred)) { Above equivalent to cross val score • Note that the accuracy is potentially a false high { MNIST is skewed: only 10% of the samples are a given number (when binary testing for is x digit y or not?) 7 Scikit-Learn: Binary Classification - Tuning • Rather than evaluate on accuracy, use the confusion matrix { A confusion matrix is a special type of contingency table that illustrates how well a classifier performs { So called because helps to determine whether classifier is confusing two classes { For example: prediction A B category A 8 2 B 6 4 OR prediction A B C category A 5 3 0 B 2 3 1 C 0 2 11 { A table of confusion (also called a confusion matrix) illustrates how well a classifier performs by showing true positives, false positives, true negatives and false negatives ∗ In the first example above: prediction A not A category A 8 TP 6 FP not A 2 FN 4 TN ∗ And for the second example above: prediction A not A category A 5 TP 2 FP not A 3 FN 17 TN 8 Scikit-Learn: Binary Classification - Tuning (2) { To get the predictions, use cross val score() (See cross val score() in Cross- Validation in Scikit notes) { Then use function confusion matrix() from sklearn.metrics import confusion matrix ∗ confusion matrix(y true, y pred, labels=None, sample weight=None) ∗ Parameters: · y true: Correct target values · y pred: Estimated target values · labels: List of labels to index the matrix If omitted, values that appear in true or y pred are used (in sorted order) Labels can be used to select a subset · sample weight: Sample weights ∗ Returns an array ∗ Examples (from API) >>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], ... [1, 1, 1, 0]).ravel() >>> (tn, fp, fn, tp) (0, 2, 1, 1) 9 Scikit-Learn: Binary Classification - Tuning (3) · Note: The results are represented as in the above examples: The predicted values are the columns, the actual the rows So in the first example, array[0; 0] indicates that 2 zeroes are correctly labeled as zeroes, array[0; 1] that no ones were labeled as zeroes, and array[0; 2] that one two was labeled as a zero • Another set of metrics to look at are the precision and recall of the classifier { Precision: ∗ Measures accuracy of the positive predictions true positives precision = true positives + false positives

Scikit-Learn: Classifiers

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support