Machine learning in trading: theory, models, practice and algo-trading - page 1298

 

Previously talked about the theoretical possibility of removing noisy values from the array, here is the original model

and here I removed 0.01 to -0.01 noise from the array with the weight of the binary tree responses

The gain is slightly less, but the relative performance has improved.

 

Hmm, removed (zeroed values) already binary trees from -0.02 to 0.02

it seems that the progress is not weak, which means that there is a core of rationality - further research is needed.

 
Aleksey Vyazmikin:

In very primitive terms, the first tree has no response to the sample, it returns zero, and the fourth tree has this response and considers the "probability" 0.6 - technically corrected the error of the first tree, but in fact revealed a relationship that previously did not exist at all.

The error of the first tree can be not only 0, but also 1.

I.e. if first tree predicted 1 and actually it was 0, then subsequent trees should decrease total from 1 to 0. I.e. 2,3 etc. trees will give negative prediction in order to decrease 1 from first tree to 0 after N steps by several subtractions.

 
elibrarius:

The error of the first tree can be not only 0, but also 1.

I.e. if the first tree predicted 1, but actually there is 0, then subsequent trees should lower the total from 1 to 0. I.e. 2,3 etc. trees will give a negative prediction, to lower 1 from the first tree to 0 after N steps by several subtractions.

Of course, I agree. But it does not change the essence - it is important the pattern that the tree will find, and the interpretation of this pattern is the work of a separate algorithm.

 
Aleksey Vyazmikin:

And I don't know about the community, i.e., I don't know how other individuals in other areas do it?

Pulling data seems logical to me, because I am looking for a model of human behavior (or algorithm) with the help of MO, there can be many such patterns of behavior and they can be independent, so it makes sense to pull as much as possible, because it is impossible to generalize them all together. And for someone the market is something whole, the result of collective mind work, some kind of a voting organ with no rules, they, probably, are looking for the right model for this situation, describing the market behavior as a separate organism.

Well how do you not know when you are part of it )

Maybe I share, because the original goal was to make something like AI, which itself selects everything, without a manual routine. Routine only when designing such a thing

I can't imagine how it's possible to manually look through hundreds / thousands of models and select something there. On the contrary, I want to forget about the "inventing" of TC like a bad dream

 
Maxim Dmitrievsky:

How do you not know when you're part of it )

Maybe I'm sharing, since the original goal was to make something like an AI that picks up everything by itself, without a manual routine. Routine only when designing such a thing

I can't imagine how it's possible to manually look through hundreds / thousands of models and select something there. On the contrary, I want to forget about "inventing" TC like a bad dream.

And I have no idea how to analyze each model separately - that's why I made an emphasis on batch processing. Individual models should be analyzed in detail in order to improve the overall algorithm of the model creation cycle, to find new ideas.

The problem is that when you have hundreds of thousands of model variants that give completely different results, it's hard to understand what to do to improve the results - this is where I have the biggest sticking point. At first I get an interesting model with 4 predictors and it seems to me that there is no sense to add more predictors, and I should just generate more models; then on the contrary I use a lot of predictors and the sample for training has more influence, plus I have more parameters for training with CatBoost itself. That's why I'm inclined to generate a lot of models and save 2-3 of every 100k and study them more thoroughly.

 
Aleksey Vyazmikin:

And I have no idea how to analyze each model separately - that's why I emphasized batch processing. I want to analyze individual models in detail to improve the overall algorithm of the model creation cycle and find new ideas.

The problem is that when you have hundreds of thousands of model variants that give completely different results, it's hard to understand what to do to improve the results - this is where I have the biggest sticking point. At first I get an interesting model with 4 predictors and it seems to me that there is no sense to add more predictors, and I should just generate more models; then on the contrary I use a lot of predictors and the sample for training has more influence, plus I have more parameters for training with CatBoost itself. Therefore I am inclined to that it is necessary to generate many models and to save 2-3 from each 100k, and there already more detailed them to study.

Yes, like that, it is desirable to automate as much as possible, that then remained a banal choice which one you like more, with a cup of coffee

It's hard to do, I agree, but then it will be AI and not a "fly in the ointment", not some common classifier.

As for the last one, there is an AutoML libs - a neural network selects the best neural network or a set of models for a particular task, which is also cool. Not used yet.
 
Maxim Dmitrievsky:

Yes, like this, preferably as much as possible to automate, which would then be a trivial choice which you like, with a cup of coffee.

It's hard to do, I agree, but then it will be AI, and not a simple classifier of some kind.

Now my results after training are processed by script (without interpreter of model - by results of CatBoost calculation), and out of 100k models I get those that meet defined criteria (criteria of model and criteria of trade) on all three samples, it's about 50-100 models, I convert them to display in the terminal and make re-pass for further more detailed selection. In principle, you may not even launch them in the terminal if you know exactly what you want, but I'm searching for the selection criteria and think it is useful to look at different models visually. It may be possible to save the balance curves in a script, but I don't know how to work with charts.

I don't know if you can create a lot of models in Python, but if interested, I can send you the batons, with which I do it.

 
Aleksey Vyazmikin:

Now after training the results are processed by my script (without the model interpreter - by results of CatBoost calculation), and out of 100k models I get those that meet the criteria (model and trade criteria) for all three samples, about 50-100 models are obtained, I convert them for playback in the terminal and make the second pass there for the further more detailed selection. In principle, you may not even launch them in the terminal if you know exactly what you want, but I'm searching for the selection criteria and think it is useful to look at different models visually. It may be possible to save the balance curves in a script, but I don't know how to work with charts.

I do not know whether you can create in Python many models at once, but if interested, I can send you the batons, which I use to do it.

in python you can do anything and more

No, not yet, thank you... I'm reading interesting books. I also used catbust in python, I compared it to forest, I didn't see any big improvements, but it works fine on its own. Actually in a couple of lines.

 
Maxim Dmitrievsky:

AutoML - the neural network itself selects the best neural network or set of models for a particular task, also cool. I haven't used it yet.

Yes I did something similar - the question again in the predictors and selection criteria (target). Now (many months later) finalize all the ideas with predictors and return to this topic. And the result is in general, previously posted how similar models work, but we need a variety of samples with different scatter, preferably from different models.

And what this AutoML uses as predictors and the target?

Reason: