
Scikit-Learn: Classifiers - Binary • Binary classification from sklearn.linear model import SGDClassifier SGDClassifier(loss=hinge, penalty=l2, alpha=0.0001, l1 ratio=0.15, fit intercept=True, max iter=None, tol=None, shuffle=True, verbose=0, epsilon=0.1, n jobs=None, random state=None, learning rate=optimal, eta0=0.0, power t=0.5, early stopping=False, validation fraction=0.1, n iter no change=5, class weight=None, warm start=False, average=False, n iter=None) { This implements linear classifiers (i.e., SVM, logistic regression, a.o.) { For best results using the default learning rate schedule, the data should have zero mean and unit variance { Expects floating point values for the features { Parameters: ∗ loss (string) · The function used to calculate error (e.g., MSE) · Determines model used: hinge: linear SVM log: logistic regression modified huber squared hinge perceptron squared loss epsilon insensitive squared epsilon insensitive ∗ penalty (string) · Type of regularization ∗ alpha (float) · Regularization term multiplier · Affects learning rate when set to 'optimal' ∗ l1 ratio · Elastic Net mixing parameter ∗ fit intercept · Whether the intercept should be estimated or not · If False, the data is assumed to be already centered 1 Scikit-Learn: Classifiers - Binary (2) ∗ max iter: (int) · Max number of passes ∗ tol (float) · Stopping criterion · If it is not None, the iterations will stop when (loss > previousloss− tol) ∗ shuffle · Whether or not the training data should be shuffled after each epoch ∗ epsilon (float) · value in loss functions for huber, epsilon insensitive, or squared epsilon insensitive ∗ eta0 (double) · Initial learning rate ∗ learning rate · Learning rate schedule 'constant': η = eta0 'optimal': η = 1:0=(α(t + t0) where t0 chosen by heuristic 'invscaling': η = eta0=pow(t; power t) 'adaptive': η = eta0 as long as training keeps decreasing ∗ power t (double) ∗ early stopping · If T rue, automatically sets aside a fraction of training data as vali- dation and terminate training when validation score is not improving by at least tol for n iter no change consecutive epochs ∗ validation fraction (float) · Proportion of training data to set aside as validation set for early stopping 2 Scikit-Learn: Classifiers - Binary (3) ∗ n iter no change · Number of iterations with no improvement to wait before early stop- ping ∗ average · When set to True, computes the averaged SGD weights and stores the result in the coef attribute · If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average ∗ n iter · Number of passes over the training data (deprecated) { Attributes: ∗ coef (array) · Weights assigned to the features ∗ intercept (array) · Constants in decision function ∗ n iter ∗ loss function { Methods (beyond fit() and predict(): ∗ decision function(X) · Predict confidence scores for samples ∗ partial fit(X, y, classes=None, sample weight=None) (array) · Perform one epoch of stochastic gradient descent on given samples ∗ score(X, y, sample weight=None) · Returns the mean accuracy on the given test data and labels ∗ predict proba · Probability estimates · Only available for log loss and modified Huber loss 3 Scikit-Learn: Classifiers - Multiclass and Multilabel 1. Note: All classifiers in scikit-learn do multiclass classification out-of-the-box • Use module sklearn:multiclass if you want to experiment with different multiclass strategies 2. Multiclass classification: • Classification task with more than two classes • Assume that each sample is assigned to one and only one label 3. Multilabel classification: • Each sample assigned a set of target labels • For when labels are not mutually exclusive 4. Multioutput regression: • Each sample assigned a set of target values • For when labels are not mutually exclusive 5. Multioutput-multiclass classification and multi-task classification: • Single estimator has to handle several joint classification tasks • For when labels are not mutually exclusive 6. Classes: (a) One v One multiclass classification from sklearn.multiclass import Onev- sOneClassifier OneVsOneClassifier(estimator, n jobs=None) • Parameters: { Self-evident • Methods: { See above • Attributes: { estimators { classes • See also OnevsRestClassifier multiclass/multilabel classifier 4 Scikit-Learn: Classifiers - Multiclass and Multilabel (2) (b) Multilabel classification from sklearn.neighbors import KNeighborsClassi- fier KNeighborsClassifier((n neighbors=5, weights=uniform, algorithm=auto, leaf size=30, p=2, metric=minkowski, metric params=None, n jobs=None, **kwargs)) • Parameters: { n neighbors (int) ∗ Number of neighbors to use by default for kneighbors queries { weights (string, callable) ∗ Weight function used in prediction 'uniform': All points in each neighborhood are weighted equally 'distance': weight points by the inverse of their distance Closer neighbors have a greater influence than farther ones callable: a user-defined function { algorithm (string) ∗ Algorithm used to compute the nearest neighbors 'ball tree': 'kd tree': 'brute': brute force 'auto': Decides the most appropriate algorithm based on the values passed to fit method { leaf size (int) ∗ Leaf sized passed to the two tree algorithms { p (int) ∗ Power parameter for the Minkowski metric { metric (string, callable) ∗ *** { metric params (dictionary) ∗ Additional keyword arguments for the metric function 5 Scikit-Learn: Classifiers - Multiclass and Multilabel (3) • Methods: { kneighbors(X=None, n neighbors=None, return distance=True) ∗ Finds the K-neighbors of a point ∗ Returns indices of and distances to the neighbors of each point { kneighbors graph(X=None, n neighbors=None, mode=connectivity) ∗ Computes the (weighted) graph of k-Neighbors for points in X ∗ Returns a sparse matrix in CSR format, shape = [n samples, n samples fit] 6 Scikit-Learn: Binary Classification • References Geron C3 • MNIST corpus { Handwriting sets of integers { Already divided into training and test sets { Each image is 28 × 28 pixels ∗ Results in 784 features (one for each pixel) ∗ Values that range from 0 (white) to 255 (black) • Recommended to scramble training set set before using, especially as numbers are listed in order in MNIST • SGDClassifier: See Learning Models in Scikit notes • For more control over cross-validation than Scikit models provide, do it man- ually: (See StratifiedKFold in Cross-Validation in Scikit notes from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=3,random_state=42) for train_index,test_index in skfolds.split(X_train,y_train_5): clone_clf = clone(sgd_clf) X_train_folds = X_train[train_index] y_train_folds = (y_train_5[train_index]) X_test_fold = X_train[test_index] y_test_fold = (y_train_5[test_index]) clone_clf.fit(X_train_folds,y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred)) { Above equivalent to cross val score • Note that the accuracy is potentially a false high { MNIST is skewed: only 10% of the samples are a given number (when binary testing for is x digit y or not?) 7 Scikit-Learn: Binary Classification - Tuning • Rather than evaluate on accuracy, use the confusion matrix { A confusion matrix is a special type of contingency table that illustrates how well a classifier performs { So called because helps to determine whether classifier is confusing two classes { For example: prediction A B category A 8 2 B 6 4 OR prediction A B C category A 5 3 0 B 2 3 1 C 0 2 11 { A table of confusion (also called a confusion matrix) illustrates how well a classifier performs by showing true positives, false positives, true negatives and false negatives ∗ In the first example above: prediction A not A category A 8 TP 6 FP not A 2 FN 4 TN ∗ And for the second example above: prediction A not A category A 5 TP 2 FP not A 3 FN 17 TN 8 Scikit-Learn: Binary Classification - Tuning (2) { To get the predictions, use cross val score() (See cross val score() in Cross- Validation in Scikit notes) { Then use function confusion matrix() from sklearn.metrics import confusion matrix ∗ confusion matrix(y true, y pred, labels=None, sample weight=None) ∗ Parameters: · y true: Correct target values · y pred: Estimated target values · labels: List of labels to index the matrix If omitted, values that appear in true or y pred are used (in sorted order) Labels can be used to select a subset · sample weight: Sample weights ∗ Returns an array ∗ Examples (from API) >>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], ... [1, 1, 1, 0]).ravel() >>> (tn, fp, fn, tp) (0, 2, 1, 1) 9 Scikit-Learn: Binary Classification - Tuning (3) · Note: The results are represented as in the above examples: The predicted values are the columns, the actual the rows So in the first example, array[0; 0] indicates that 2 zeroes are correctly labeled as zeroes, array[0; 1] that no ones were labeled as zeroes, and array[0; 2] that one two was labeled as a zero • Another set of metrics to look at are the precision and recall of the classifier { Precision: ∗ Measures accuracy of the positive predictions true positives precision = true positives + false positives
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages20 Page
-
File Size-