Discussing the article: "Quantization in machine learning (Part 2): Data preprocessing, table selection, training CatBoost models"

 

Check out the new article: Quantization in machine learning (Part 2): Data preprocessing, table selection, training CatBoost models.

The article considers the practical application of quantization in the construction of tree models. The methods for selecting quantum tables and data preprocessing are considered. No complex mathematical equations are used.

Let's consider the data preprocessing methods I have implemented using the example of describing the functionality of the Q_Error_Selection script.

In brief, the objective of the "Q_Error_Selection" script is to load a sample from the "train.csv" file, transfer the contents into the matrix, preprocess the data, alternately load quantum tables and assess the error of the recovered data relative to the original ones for each predictor. The assessment results of each quantum table are to be saved into the array. After checking all the options, we will create a summary table with errors for each predictor and select the best options for quantum tables for each predictor according to a given criterion. Let's create and save a summary quantum table, a file with CatBoost settings, to which predictors excluded from the list for training will be added with serial numbers of their columns. Also, create accompanying files depending on the selected script settings.

Author: Aleksey Vyazmikin

Reason: