CatBoost cheat sheet

Important CatBoost classes clf.grid_search(params # dict, X # Pool or list, # CatBoost feature data type y # list or none (if Pool), from catboost import Pool cv # cv function or no. of folds) # cross-validation generator ) from catboost import cv # classifier # compare two CatBoost models from catboost import CatBoostClassifier clf.compare(otherCBModel, # regressor evalDataset, from catboost import CatBoostRegressor metric # list of metric(s)) # general purpose (e.g. ranking) from catboost import CatBoost Model I/O

Pool Save (export) models train_set= Pool(data, # data # Python format labels, # labels clf.save_model(filename, format="python") # indices/column names of # categorical features # C++ format cat_features, clf.save_model(filename, format="CPP") # workers to read from files: thread_count=-1 # JSON format # ... more functions clf.save_model(filename, format="json") # to define how inpute # data should be handled # PMML format ) clf.save_model(filename, format="pmml")

# CoreML format Classifier/Regressor options clf.save_model(filename, format="coreml", export_parameters={ params={ 'iterations': # int, 'prediction_type': predictionType 'learning_rate': # float, }) 'depth': # range [1,16], 'task_type': 'GPU' # CPU/GPU, # Java, Rust (CatBoost binary model format) '': clf.save_model(filename, format="cbm") ) clf= CatBoostClassifier(params) Load (import) models

Attributes clf= CatBoostClassifier() # Python format # similar to scikit-learn clf.load_model(filename, format="python") clf= CatBoostClassifier() clf.fit() # C++ format clf.predict() clf.load_model(filename, format="CPP") clf.predict_proba() # CoreML format # CatBoost-specific clf.load_model(filename, format="AppleCoreML") # cross-validation # (can be passed to grid_search) # CatBoost binary format scores= cv(train_set # pool, clf.load_model(filename, format="cbm") params # model params, fold_count # (int) folds, shuffle # bool, Other hints stratified # bool, ) • Many attributes have a option called “plot” to plot processes/metrcs/etc.

# gridsearch for hyperparameter tuning • Do not use One-Hot-Encoding with CatBoost.

(c) Simon Wenkel (https://www.simonwenkel.com) This pdf is licensed under the CC BY-SA 4.0 license