Machine learning in trading: theory, models, practice and algo-trading - page 101

 
Dr.Trader:

Much of this you yourself know and have done, but I will write more fully, to exclude some assumptions about which we talk the same way, but do differently.


All this from advice from this forum, and experience.

Thank you.

I will answer point by point.

One must have a carefully honed fitness function to estimate model parameters. If the function evaluates specific model parameters and selected predictors with a high score - there should be good results in the test on the new data.
(I will keep in mind for any estimation that the higher it is, the better)

There is such a function. I don't need R^2 (although at the beginning I aimed to confirm that the market is stupidly predictable in terms of regression). I use a written function to estimate the MO and the sum of points with the spread. This function runs inside the CV loop and selects the best parameters to train the model.

The fitness function in the loop should repeat at least 50 times:
1) divide the data into 2 parts, 50%/50%. There should be both random sampling with sample and sequential sampling (the piece for training somewhere in the middle, and for validation, what surrounds it at the beginning and end of the raw data). Including extreme variants, where training on the first half of the table, and validation on the last, and vice versa. I believe that for training it is important to have both sequentially selected samples, and random. In the picture you can see more clearly some examples of random and sequential partitioning, the green lines are for training, the yellow lines are for validation.

It's not really clear. I just don't understand what repetitions you are referring to.

Again, I have 99 training sets (10 years each). Each set is unique in terms of examples, but they are all from the same "before" time period.

On each set, I run a CV with 2 to 10 fouls (I also loop through them).

But I have all the fouls strictly separated by time. I think this is an invariably correct method.

So 99 * number of CV parameter combinations (hundreds) = hundreds of trained models, which show the best quality metric on validation fouls.

Then there is the same number of deferred samples taken from the time period "after" (which is 5 years). They are also unique. I do a nested CV: I check each obtained model for CV experience on the deferred sample. I get two vectors of quality metric values: on CV and on deferred sample.

Question for connoisseurs:

in case I have constent data and a suitable model (package), what do I expect to get on deferred samples?

Answer: the quality metric (QM) on the deferred samples should be correlated with the quality metric on the validation fouls.

In reality, achieving this is a big problem.

Why do I need it? So that I can confidently select the best models just by relying on my crossvalidation. Since I have already verified that it gives consistent results with the future.

Committees and other top-level models also need to pass the fitness test on that deferred sample on which they have not been trained in any way.

 
Dr.Trader:


2) train the model on the training sample (model parameters and predictors used are the same throughout the cycle, the ones you're trying to estimate), then predict these very data with this model, and put a score on the prediction - accuracy, r^2, or something else. For example, I round the regression result to classes, and use Cohen's Kappa function for estimation, it's from caret, I liked it better than classification accuracy or regression accuracy. But it only works for two classes. For three classes I find it difficult to suggest anything, the important thing is that the estimate takes into account the accuracy for each class separately, and gives some overall estimate based on that.
3) Apply the trained model to predict data from the validation sample, estimate the prediction with the same function.
4) Both estimates (training and validation) should be close to each other, and as high as possible. I use this code for final score - (min(score1,score2) - (max(score1,score2)-min(score1,score2)) - Their delta is subtracted from the minimum value.

At the end of each iteration we will get some score, and due to random splitting of data it may vary from -1 to 1 (or in another interval, depending on the function used). We calculate their average value, and return it as a result of the fitness function. Additionally, I subtract a small number (0.0001) from the fitness value for each used predictor, to penalize the model for too large a set of required data.


Can you explain, please, that training means evaluation on crossvalidation fouls?

Is the proximity of the training and validation estimates correlated with the deferred sampling and the crossvalidation result?

If that's the case, then we're right on close.

 
Alexey Burnakov:

Can you clarify, please, that training means evaluation on crossvalidation fouls?

Is the proximity of the training and validation estimates correlated with the deferred sampling and the crossvalidation result?

If that's the case, then we're right on close.

What does "correlation" mean? "+1"? "-1"? What with what?

Here are 50 results of the run I understand. At 50 results you can already apply statistics, count the average, deviations, and most importantly the confidence interval....

 
mytarmailS:

Can you see the result of yesterday's trade?

Ү I won't say it's perfect, but still the model is 80% generalized....

 
SanSanych Fomenko:

What does "correlation" mean? "+1"? "-1"? What with what?

Here are 50 results of the run I understand. At 50 results, you can already apply statistics, count the mean, deviations, and most importantly the confidence interval....

Nah, you didn't get it, CC.

Correlate the quality metric on the deferred sample with the quality metric on the crossvalidation (like, a test sample that evaluates a trained model). If we have hundreds of trained models, we get two vectors of quality metrics.

For example: classification accuracy on crossvalidation is 57%, on delayed sampling it is 50%. And there are hundreds (thousands) of such values. But we have hundreds and thousands of trained models. So the question arises.

 
Alexey Burnakov:

Can you clarify, please, that training means evaluation on crossvalidation fouls?

Is the proximity of the training and validation estimates correlated with the deferred sampling and the crossvalidation result?

If that's the case, then we're right on close.

In code it's something like this:

fitness <- function(inputTestPredictors, inputTestModelParams) {
    allScores <- c()
    for(i in 1:50){
        rowSampleTrain <- sample(nrow(trainData), round(nrow(trainData)*0.5))
        rowSampleValidate <- setdiff(1:nrow(trainData), rowSampleTrain)
        #ещё  нужно добавить с вероятностью 50% - деление строк просто по порядку, без sample

        model <- TrainModel(target ~., data=trainData[rowSampleTrain, inputTestPredictors], p1 = inputTestModelParams$parameter1, p2 = inputTestModelParams$parameter2)
        #вместо  TrainModel - какойто пакет из R - rf, gbm, nnet, lm, ...

        predictResultsForTrain <- predict(object = model, newdata=trainData[rowSampleTrain, inputTestPredictors])
        predictResultsForValidate <- predict(object = model, newdata=trainData[rowSampleValidate, inputTestPredictors])

        score1 <- CalcPreditionQuality(predictResultsForTrain, trainData[rowSampleTrain, "target"]))
        score2 <- CalcPreditionQuality(predictResultsForValidate , trainData[rowSampleValidate, "target"]))
        score_final <- min(score1,score2) - (max(score1,score2) - min(score1, score2))
        allScores <- c(allScores, score_final)
        # CalcPreditionQuality - функция для оценки качества прогноза относительно ожидаемых значений. Например точность, или F-score, или каппа
    }
    predictorCountPenalty <- sum(inputTestPredictors==TRUE) * 0.0001
    return(mean(allScores) - predictorCountPenalty)
} 
 
Elements of machine learning in practice were not applied to the training on the history of trading, but on the history of using a set of signals/options within their own robotic trading sessions in order to correlate the triggering of certain models in the "live" market in certain configurations / state of the market / signals.
 
Dr.Trader:

In the code, it's something like this:

Yes just tell me in words, are you comparing which samples? There is training, there is test (multiple training and test = crossvalidation). There is validation (delayed sampling).

I have three columns:

training all. ---- test fouls crossvalidation. ----- pending observations

0.7 0.65 0.55

... .... ....

The fact that training with the test is correlated is nonsense, because the chosen model in the result is trained on the whole set, parts of which were included in the test.

But the fact that the test estimates correlate with pending ones is important to me.

 

In your terms, I'm comparing training and test.
I don't have a validation (pending) sample when training the model. The deferred sample will be the new data that the model will trade after training.

The fact that the training correlates with the test is nonsense.

That's why I do multiple data partitioning and multiple model re-training. If the model parameters are unsuccessful, the average result on the test samples will be much lower than the average result on the training samples

 
Dr.Trader:

In your terms, I'm comparing training and test.
I, it turns out, have no validation (deferred) sampling when training the model. The deferred sampling will be new data that will be used by the model after training.

That's why I do multiple data partitioning and multiple model re-training. If the model parameters are unfortunate, then the average result on the test samples will be much lower than the average result on the training samples

There is zimus in what you are doing.

However, you should also try delayed sampling. This is a classic. Train, Test, Validate.

And make the procedure even more complicated. For each model that seems to work well in terms of training and testing, let's call this model X, do validation on the deferred data. You thus get an idea of whether or not you're choosing the model correctly, using only training and testing. Make many models with different parameters, pick the best ones (10, 100, 1000). Fail. You will understand if your "best" metric is reflected in future data or not. Only after that do you go into battle.

Reason: