Machine learning in trading: theory, models, practice and algo-trading - page 1290

 
Aleksey Vyazmikin:

This easily fits into my theory about the market. Just someone with a lot of money turned on his algorithm for recruiting positions/operations, some big bank, maybe the Central Bank, of course, it is not done quickly, but since this participant was dominant, and the market situation contributed to this, it was possible to find signs of his algorithm. Of course, after the participant stopped influencing the market, the signs stopped working. There are many such participants (maybe 100), their algorithms may overlap, but there is an assumption that they are similar (remember technical analysis and the requirement for the banks to justify their trading operation, taking this analysis into account (at least in Russia)), and for this reason it makes sense to analyze a large sample, where one and the same algorithm was running many times, then there is a chance to understand how it works, to describe it by indirect signs, but the model must learn to identify it and not work at this time on noise, waiting for the algorithm, under which the algorithm works. I think this would of course work even better on stocks and derivatives, I don't do MO on forex.

But at the end of the day, we need to find 10 models that will describe the big money algorithm and learn how to determine which algorithm is preferable. Since the cycle of the algorithm can be a couple of days and it will probably be repeated over a short period of time, it's okay if we enter with a small delay, the main thing is to choose the right model for this algorithm.

Well, I don't know about finding 10 models. But the fact that the information entering the market is structured into a cycle that has a beginning, a continuation and an end, is a sure thing

The bifurcation point is the moment of impact of a new explosive portion of information, which is then structured in time. The perturbation that disperses in circles (waves) for some time.

That is why, for example, patterns within futures contracts are stronger than within different futures contracts.

 
Maxim Dmitrievsky:

on history exists, how to algorithmize it I do not know

It may well exist on history. And real-time, and in time, is it even possible to detect it? If it is unknown, then it is not at all certain that the solution even exists.

I usually check such things with statistics. For the most part, the results are nothing). You can see with your eyes, but statistics say that there is nothing there - an apparent reflection of the apparent moon).

 
Yuriy Asaulenko:

It may well exist on history. And real-time, and in time, is it even possible to detect it? If it is unknown, then it is not at all certain that the solution even exists.

I usually check such things with statistics. For the most part, the results are nothing). You see with your eyes, and the statistics say there's nothing there - a seeming reflection of the apparent moon).

If you look at the graph, you can do it with your eyes, I do not know how to do it programmatically.

 
Maxim Dmitrievsky:

Well, I don't know about finding 10 models. But the fact that the information entering the market is structured into a cycle that has a beginning, a continuation and an end is certain.

This requires to abandon the flip algorithms and teach the model to be not always in the market. My strategy allows to do this and I'm now experimenting in this direction with catbust models. Unfortunately the tree leaves that I've been saving for half a year can no longer be applied, because I've made some changes in the predictors (there were errors in the logic and leveled delays at bar opening - i.e. the problem with the calculation was in real time delays), but working with them confirms that there is a possibility to detect patterns that are more accurate individually than the predictive power of the tree/forest, and therefore their combination gives good results.

Maxim Dmitrievsky:

And the bifurcation point is just the moment of exposure to a new explosive portion of information, which is then structured in time. Disturbance that disperses in circles (waves) for some time.

That is why, for example, the patterns within futures contracts are stronger than in different futures contracts, it is noticed

So it is necessary to identify these points, perhaps they should be made targets. And one model identifies them, and depending on the target one chooses an appropriate trading model. Another thing is that here again we should think about predictors and minimizing data in order to express the cycle start period in one line, when the period itself may be set in (throughout), say, 10 bars.

 
Maxim Dmitrievsky:

If you poke into the graph, you can do it with your eyes, I can't do it programmatically.

Programmatically, it's a very long story.

In the previous system, there were >30 conditions (parameters) for Long input, the same for shorts, and a little less for outputs. There was a lot of work, including the creation of a lot of sets, and their separation, the elimination of any deals with additional conditions, missing from the sets, etc.

 
Aleksey Vyazmikin:
Yuriy Asaulenko:

Programmatically, it's a very long story.

In the previous system there were >30 conditions (parameters) for Long input, the same for shorts, and a little less for outputs. There was a lot of work, including the construction of a lot of sets, and their separation, elimination of any trades falling out of the sets with additional conditions, etc.

So we have to identify these points, maybe we should make them targets. And one model identifies them, and depending on the target one selects the corresponding trading model. Another thing is that we should again think about predictors and about minimizing data in order to express the cycle start period in one line, when the period itself can be set in (during), say, 10 bars.

Well, it's clear, no one argues ))

 
About catbucht, it turned out that on the GPU you can run more than one processing via batch, i.e. to run two batch at once, calling the console application. And in this case, at least on my models, the speed of the GPU to produce models does not change, which means that you can parallelize calculations. The limits and limitations are not yet fully understood. Drop your alglib, and let's go torture catbust ;)
 
Aleksey Vyazmikin:
About catbucht, it turned out that on the GPU it is possible to run more than one processing via batch, i.e. to run two batch at once, calling a console application. And in this case, at least on my models, the speed of the GPU to produce models does not change, which means that you can parallelize calculations. The limit and limitations are not yet fully understood. Drop your alglib, and let's go torture catbust ;)

The program is surprisingly good (unlike everything else from Yandex), it even uses CERN to process data from the collider

no time yet, then maybe later

 
Aleksey Vyazmikin:

There is also a cool software KNIME all sorts of boosting, data analysis and visualization

it is possible to saw xgboost without programming, catboost seems to be also possible to add

KNIME - Open for Innovation
KNIME - Open for Innovation
  • 2019.01.28
  • www.knime.com
KNIME, the open platform for your data.
 
Maxim Dmitrievsky:

The program is surprisingly good (unlike everything else from Yandex), even CERN uses it to process data from the collider

I don't have time yet, then mb

I think there is an important contribution from the open source code, which is periodically corrected and new releases are added. Eh, if I could read that code... it seems to me that there's a clandyke of ideas there that you can borrow and develop yourself, coming up with your own boost.


Maxim Dmitrievsky:

There's also a cool software KNIME boosting, data analysis and visualization.

you can write xgboost without programming, catboost seems to be able to add it too!

Thank you! However, for the time being catboost is enough for me because I have debugged the whole cycle there, from creating a sample till implanting it into Expert Advisor. And unlike bridging via python, I can use optimization to test different models, their combinations and just interpretations of "probabilities" they produce.

Although I can't work exactly with categorical features (CatBoost feature) - I don't have such model interpreter, but my preliminary research showed that using such models gives more stability on time intervals, i.e. models are better, though training is 5 times slower.

Reason: