Machine learning in trading: theory, models, practice and algo-trading - page 3167

 
Maxim Dmitrievsky #:

0.99 train/test, with the model truncated to a couple of iterations. Only a few rules remain that predict classes well.

TP=10 and SL=1000 ?)

 
Forester #:

TP=10 and SL=1000 ?)

No, it's fun if you want to make a lot of trades.

open new ones on every bar
 
Vladimir Perervenko #:

What do you mean, homemade? There is a theoretical justification, a good article. There's a package called RLTv3.2.6. It works pretty well. Pay attention to the version.


Good luck

In my opinion, not homemade, if the following conditions with a specific example are fulfilled.

Initially, now much less, the site was full of self-made "geniuses", who sitting in the kitchen invented something, used terminology out of their heads and started "researching", and not just "researching" but disproving existing and generally recognised things.

All these people don't realise that their home-made code is not worth a penny as it has NOT theoretical substantiation which is published in serious journals and then discussed, often for years, by people having the corresponding training. Then the code is written and tested by a large number of users and only after that it all becomes suitable for industrial use.

There is no point in discussing local "geniuses".

But katbust.

Let's compare documentation on katbust and XGBoost to understand the underhandedness of a non-core organisation and professional very similar development.

 
Maxim Dmitrievsky #:
And the main self-made and self-made man is Breiman, because he didn't write in R. He's such a kolkhoznik.

Learn R so that you don't look completely ignorant: practically all packages in R are NOT written in R. Usually it's C++ or Fortran, and R is just access. That's why computationally intensive algorithms in R work no worse than C++.

 
СанСаныч Фоменко #:

Learn R so that you don't look completely ignorant: almost all packages in R are NOT written in R. Usually it is C++ or Fortran, and R is just access. That's why computationally intensive algorithms in R work no worse than C++.

No way, it's the first time I've heard that.

Will there be any more enlightening information? )

I've already got to the catbuster... )))

 
mytarmailS dimensionality reduction, the model became more repeatable.

and the last perhaps decorative touch


I wonder how the MO will be trained on such data?

This is a test sample.

Have you ever seen numbers like this on your own?




Most likely it is retrained, as it is linked to absolute price values.

 

Wrote a function that re-labels the labels and makes them more predictable for your traits, the model becomes more stable.

If you have a small dataset, you can drop it for checking, and make sure on your data (or get frustrated).

For the python folks:

    c = coreset[coreset.columns[1:-4]] // ваш датасет без меток. Нужно брать только трейн/тест данные, на остальных не делать кластеризацию, иначе подгонка
    kmeans = KMeans(init='k-means++', n_clusters=clusters).fit(c) // кол-во кластеров - гиперпараметр
    coreset['clusters'] = kmeans.predict(c)
    mean_labels = coreset.groupby('clusters')['labels'].apply(lambda x: x.mean()) // считаем среднее по меткам каждого кластера
    coreset['labels'] = coreset.apply(lambda row: 0 if mean_labels[row['clusters']] < 0.5 else 1, axis=1) // если среднее больше 0.5, то для всех элементов кластера ставим метку 1 и наоборот
the model is more stable if the clusters were representative. So by brute force method the number of clusters and by which chips to cluster.
 
Aleksey Nikolayev #:

As far as I understand, the commands for working with R in an interactive session are commented out. First you load the whole script to define the functions, and then the commands line by line, pressing enter after each one. This is probably something like a standard in scientific publications - rely only on the command line and avoid environments like Rstudio.

For the sake of brevity I called CTree from the data collection and class templates, which seem to be unavoidable too.

Anomaly detection is included in the goals there - it is looking for where fires are anomalously frequent.


PS. Some time ago I wrote to you about the use of Poisson distribution, and here it is developed to the working code.

I haven't tried it all yet - I'm stuck on one of my tasks.

I will definitely try to run it on my own data. I am accumulating different solutions on this topic.

Regarding the Pausson distribution - it's interesting in theory, but when I look at the data, at the sequence, let's say there may be 20 zeros in a row, and then a mix of zeros and ones, and these skips are significant, they seem to be inconsistent with the distribution.

 
Aleksey Vyazmikin #:

Regarding the Pausson distribution - it's kind of interesting in theory, but when I look at the data, at the sequence, let's say, there may be 20 zeros in a row, and then a mix of zeros and ones, and these omissions are significant, they seem to be inconsistent with the distribution.

The idea is to split the examples into groups that are different from each other and within which there is homogeneity. It is not at all certain that specific features allow to do this. Nor is it a fact that any of them do, because of non-stationarity, for example.

I don't plan to study this article in detail, as it only touches the topic I am interested in. CHAID is a bit closer, but not quite the same.

 
Vladimir Perervenko #:

Vladimir, what maximum "honest" akurasi did you get on the new data?

And with what MO algorithm?

Reason: