Python for Data Science Cheat Sheet Scikit-Learn

Python For Data Science Cheat Sheet Create Your Model Evaluate Your Model’s Performance Scikit-Learn Supervised Learning Estimators Classification Metrics Learn Python for data science Interactively at www.DataCamp.com Linear Regression Accuracy Score Estimator score method >>> from sklearn.linear_model import LinearRegression >>> knn.score(X_test, y_test) >>> lr = LinearRegression(normalize=True) >>> from sklearn.metrics import accuracy_score Metric scoring functions Support Vector Machines (SVM) >>> accuracy_score(y_test, y_pred) Scikit-learn >>> from sklearn.svm import SVC Classification Report >>> svc = SVC(kernel='linear') >>> from sklearn.metrics import classification_report Precision, recall, f1-score Scikit-learn is an open source Python library that Naive Bayes >>> print(classification_report(y_test, y_pred)) and support implements a range of machine learning, Confusion Matrix >>> from sklearn.naive_bayes import GaussianNB preprocessing, cross-validation and visualization >>> from sklearn.metrics import confusion_matrix >>> gnb = GaussianNB() >>> print(confusion_matrix(y_test, y_pred)) algorithms using a unified interface. KNN Regression Metrics A Basic Example >>> from sklearn import neighbors >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) Mean Absolute Error >>> from sklearn import neighbors, datasets, preprocessing Unsupervised Learning Estimators >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import mean_absolute_error >>> from sklearn.metrics import accuracy_score >>> y_true = [3, -0.5, 2] >>> iris = datasets.load_iris() Principal Component Analysis (PCA) >>> mean_absolute_error(y_true, y_pred) >>> X, y = iris.data[:, :2], iris.target >>> from sklearn.decomposition import PCA Mean Squared Error >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) >>> pca = PCA(n_components=0.95) >>> from sklearn.metrics import mean_squared_error >>> scaler = preprocessing.StandardScaler().fit(X_train) K Means >>> mean_squared_error(y_test, y_pred) >>> X_train = scaler.transform(X_train) R² Score >>> X_test = scaler.transform(X_test) >>> from sklearn.cluster import KMeans >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from sklearn.metrics import r2_score >>> knn.fit(X_train, y_train) >>> r2_score(y_true, y_pred) >>> y_pred = knn.predict(X_test) Clustering Metrics >>> accuracy_score(y_test, y_pred) Model Fitting Adjusted Rand Index Supervised learning Also see NumPy & Pandas Fit the model to the data >>> from sklearn.metrics import adjusted_rand_score Loading The Data >>> lr.fit(X, y) >>> adjusted_rand_score(y_true, y_pred) >>> knn.fit(X_train, y_train) Your data needs to be numeric and stored as NumPy arrays or SciPy sparse >>> svc.fit(X_train, y_train) Homogeneity matrices. Other types that are convertible to numeric arrays, such as Pandas Unsupervised Learning >>> from sklearn.metrics import homogeneity_score Fit the model to the data >>> homogeneity_score(y_true, y_pred) DataFrame, are also acceptable. >>> k_means.fit(X_train) V-measure >>> pca_model = pca.fit_transform(X_train) Fit to data, then transform it >>> import numpy as np >>> from sklearn.metrics import v_measure_score >>> X = np.random.random((10,5)) >>> metrics.v_measure_score(y_true, y_pred) >>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F']) >>> X[X < 0.7] = 0 Prediction Cross-Validation Supervised Estimators >>> from sklearn.cross_validation import cross_val_score Predict labels >>> print(cross_val_score(knn, X_train, y_train, cv=4)) Training And Test Data >>> y_pred = svc.predict(np.random.random((2,5))) >>> print(cross_val_score(lr, X, y, cv=2)) >>> y_pred = lr.predict(X_test) Predict labels >>> from sklearn.model_selection import train_test_split >>> y_pred = knn.predict_proba(X_test) Estimate probability of a label > > > X_train, X_test, y_train, y_test = train_test_split(X , Unsupervised Estimators Tune Your Model y, Predict labels in clustering algos random_state=0) >>> y_pred = k_means.predict(X_test) Grid Search >>> from sklearn.grid_search import GridSearchCV >>> params = {"n_neighbors": np.arange(1,3), Preprocessing The Data "metric": ["euclidean", "cityblock"]} >>> grid = GridSearchCV(estimator=knn, Standardization Encoding Categorical Features param_grid=params) >>> grid.fit(X_train, y_train) >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.preprocessing import LabelEncoder >>> print(grid.best_score_) >>> print(grid.best_estimator_.n_neighbors) >>> scaler = StandardScaler().fit(X_train) >>> enc = LabelEncoder() >>> standardized_X = scaler.transform(X_train) >>> y = enc.fit_transform(y) >>> standardized_X_test = scaler.transform(X_test) Randomized Parameter Optimization Normalization Imputing Missing Values >>> from sklearn.grid_search import RandomizedSearchCV >>> params = {"n_neighbors": range(1,5), >>> from sklearn.preprocessing import Normalizer "weights": ["uniform", "distance"]} >>> from sklearn.preprocessing import Imputer >>> rsearch = RandomizedSearchCV(estimator=knn, >>> scaler = Normalizer().fit(X_train) >>> imp = Imputer(missing_values=0, strategy='mean', axis=0) param_distributions=params, >>> normalized_X = scaler.transform(X_train) >>> imp.fit_transform(X_train) cv=4, >>> normalized_X_test = scaler.transform(X_test) n_iter=8, random_state=5) Binarization Generating Polynomial Features >>> rsearch.fit(X_train, y_train) >>> print(rsearch.best_score_) >>> from sklearn.preprocessing import Binarizer >>> from sklearn.preprocessing import PolynomialFeatures >>> binarizer = Binarizer(threshold=0.0).fit(X) >>> poly = PolynomialFeatures(5) DataCamp >>> binary_X = binarizer.transform(X) >>> poly.fit_transform(X) Learn Python for Data Science Interactively.

Python for Data Science Cheat Sheet Scikit-Learn

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support