Discussion of article "Random Forests Predict Trends"

 

New article Random Forests Predict Trends has been published:

This article considers using the Rattle package for automatic search of patterns for predicting long and short positions of currency pairs on Forex. This article can be useful both for novice and experienced traders.

The initial aim of building any trading system is to predict behavior of a market instrument, for instance, a currency pair. The objectives of predictions can be different. We shall confine ourselves with predicting trends, or to be precise, predicting growth (long positions) or decline (short positions) of currency pair quotes.

To predict currency behavior, a trader attaches a couple of indicators to a currency pair chart and tries to find a pattern that has predictive power.

This article considers automatic selection of patterns and their preliminary evaluation using the Rattle package, which is a library of the R statistics analysis system.

We are going to use R for predicting behavior of currency pairs which is ideal for forecasting financial markets. Saying that, R is primarily a programming language for qualified statisticians and is beyond comprehension for many traders. The complexity of R is exacerbated by the fact that the tools for prediction are numerous and scattered across many packages that make the basic functionality of R.

Rattle (the R Analytical Tool To Learn Easily) unites a set of R packages, which are important for developing trading systems but not easy to use separately by novices. One does not have to know and understand R to begin working with Rattle. The result of working with Rattle will be code in R, which can be used for developing a real trading system. However, at this stage knowledge of R is going to be required.

In any case, Rattle is an irreplaceable tool at the stage of designing a trading system. It allows even beginners to quickly see the results of various ideas and assess them.

Rattle (Williams, 2009) is free software with open source code created as a package, which is a part of R (R Developing working group, 2011). Since it is free software, source code of Rattle and R is available without limitations. The Rattle source code is written in С and users are allowed and encouraged to study the code, test and extend it.

Fig. 1. The ZigZag indicator

Author: СанСаныч Фоменко

 

The first thought that came to mind when reading the article was "what a shoddy translation".

I looked up where the author was from - it turned out not to be a translation after all)))

В качестве инструмента для предсказания поведения валютных пар выберем систему R, которая идеально подходит для задач предсказания на финансовых рынках и, в частности, предсказания поведения валютных пар.

I didn't want to say anything bad, just thoughts in the ear. The article may be good, it may be very good......

 

Interested in the idea itself (didn't know about random forest ). Artificially imposed tool. I want to repeat without it - it is not clearly stated what exactly was done, what data vectors are used. Zigzag parameters are omitted. Is the target audience users of the Rattle package? Or those writing in R? Without their study many points of the article are incomprehensible. Conclusions are rambling - it is not clear from the article whether this direction is promising or not. It would be possible to give examples of trade. General impression - as if it was written for a narrow circle.

PS. When trying to open TC.Rdata from the article, the following appears:

Ошибка в sqrt(ncol(crs$dataset)) : 
  нечисловой аргумент для математической функции
Вдобавок: Предупреждение
In rm(crs) : объект 'crs' не найден
 
wmlab:
Interested in the idea itself (didn't know about random forest). Artificially imposed tool. I want to repeat without it - it is not clearly stated what exactly was done, what data vectors are used. Zigzag parameters are omitted. Is the target audience users of the Rattle package? Or those writing in R? Without their study many points of the article are incomprehensible. Conclusions are rambling - it is not clear from the article whether this direction is promising or not. It would be possible to give examples of trade. The general impression is as if it was written for a narrow circle.

Something you have a lot of picky. I, for example, not being a mathematician, understood the essence quite well, except for some points that should have been explained in more detail.For example, I didn't quite understand the term "tree splitting", it occurs several times and it's not quite clear what exactly is meant by it. And as for conclusions, what conclusions do you need? "Go! to the barricades"? I, for example, after reading this article I realised that I was inventing a bicycle, trying to invent similar algorithms myself instead of using a ready-made solution.As I understood, the main idea was to popularise the R package among amateurs interested in statistics (but not for "dummies"). And it turned out quite well, in my opinion.

 
wmlab:

Interested in the idea itself (didn't know about random forest). Artificially imposed tool. I want to repeat without it - it is not clearly stated what exactly was done, what data vectors are used. Zigzag parameters are omitted. Is the target audience users of the Rattle package? Or those writing in R? Without their study many points of the article are incomprehensible. Conclusions are rambling - it is not clear from the article whether this direction is promising or not. It would be possible to give examples of trade. General impression - as if it was written for a narrow circle.

PS. And when trying to open TC.Rdata from the article the following is given:

Target audience - users of Rattle package? Or those writing in R language? Without studying them, many points of the article are incomprehensible.

When using Rattle, I identify two target audiences:

Untrained in R users who can compose their own input .csv file and create and evaluate the results of 6 models, not just random forests. The main problem is not the model, but the model input data. If you manage to find the initial data, then you can order programming. The main thing is the idea, and programming techniques can also be applied

PS. And when trying to open TC.Rdata from the article gives the following:

Re-checked, everything is normal. the sequence of action is as follows:

  • start R
  • tab {File/load workspace
  • then look for the file TC.RData from the unpacked archive.
  • Launch Rattle
  • Data\R Dataset tab
  • in the drop-down menu look for TC name in the Data Name window.
  • Click "Run"

We get the list of vectors you are interested in.

Another audience is trained users. Rattle is rather convenient tool for selection of initial data for the model. The main time when building trading systems is spent on selection of initial data - the most uncertain part of work. This is where Rattle is very useful. You can get a final estimate very quickly, without having to go into very complex mathematical model constructions.

Good luck, ready to continue the explanations

 
meat:

You're just being nit-picky. I, for example, not being a mathematician, understood the essence quite well, except for some points that should have been explained in more detail.For example, I didn't quite understand the term "tree splitting", it occurs several times and it's not quite clear what exactly is meant by it. And as for conclusions, what conclusions do you need? "Go! to the barricades"? I, for example, after reading it, realised that I was inventing a bicycle, trying to invent similar algorithms myself instead of using a ready-made solution.As I understood, the main idea was to popularise the R package among amateurs interested in statistics (but not for "dummies"). And it turned out quite well, in my opinion.

For example, I didn't quite understand the term "tree splitting"

We take the root of the tree - it is at the top.

We divide the root and draw two lines from the root - root splitting, then the same action at each level.

In each node of the tree some condition formulated by the algorithm is checked. For example, if eurusd > 1.35, then go along the left branch, if not, go along the right one.

The algorithm has excessively generated 500 such trees. There should be 10000 conditions.

Next. On arrival of values of all predictors - one value of each predictor - I have 88 pieces in total, are compared with trees and a decision is made that this particular set of 88 values entails long or short. I.e. some analogue of a pattern, only very much so.

 
faa1947:

I double-checked, it's fine. The sequence is as follows:

  • Run R
  • file/load workspace tab
  • then look for the file TC.RData from the unpacked archive.
  • Launch Rattle
  • Data\R Dataset tab
  • in the drop-down menu look for TC name in the Data Name window.
  • Click "Run"

We get the list of vectors you are interested in.

Thank you! Now it's clear.

Could you explain the essence of these vectors: ZZ.75, ZZ.35?
What are the vectors *.dif1, *.dif2, *.dif3? Increments? Of what relative to what?
Vectors eur, gbp, etc.? - what are they?

And a global question - why is data from indicators added? Isn't it the same as [x, f1(x), f2(x)]? I'm hinting at redundancy. Have you not tried just supplying the increments of the closures?

Thanks in advance for your answers.

 
wmlab:

Thank you! Now it's clear.

Could you explain the essence of these vectors: ZZ.75, ZZ.35?
What are the vectors *.dif1, *.dif2, *.dif3? Increments? Of what relative to what?
Vectors eur, gbp, etc.? - what are they?

And a global question - why added data from indicators? Isn't it the same as [x, f1(x), f2(x)]? I'm hinting at redundancy. Have you not tried just supplying the increments of the closures?

Thanks in advance for your replies.

Could you please explain the essence of these vectors: ZZ.75, ZZ.35?

These are ZZs with a minimum reversal distance of 75 pips and 35. The target variable TREND is obtained from ZZ.35. These variables cannot be used in the model - looking into the Future.

What are the vectors *.dif1, *.dif2, *.dif3? Increments? Of what relative to what?

The increment to the previous bar, the previous previous (-2) and (-3) bar. The idea is to account for trends.

Vectors eur, gbp, etc. - what are they?

The quotients eurusd, gbpusd are the first symbols.

I'm hinting at redundancy. Have you tried just feeding in the increments of the closes?

I've tried a lot of things. The article is a demonstration of possibilities and at the same time redundancy, so that everyone could select for their own ideas and try. Selection is done very simply - the variable is marked as Ignore.

 

Congratulations on the article SanSanych!

Of course on R it would be more compact, but we should probably start there first.

I'm finishing the article on "Deep" learning. With your data we will try to compare the result.

Good luck

 

I took a close look at the dataset, it's no good. The quotes need to be discarded. That is, the first 48 variables are removed. We are left with a data set with 42 input variables and one target variable.

Well, it's up to the owner, everyone chooses according to his taste, experience, etc. But there is one remark about the model. "RandomForest is wonderful because it doesn't need preprocessing, none at all. It accepts raw data marvellously.

Otherwise of course the article is very useful.

 
vlad1949:

Congratulations on the article SanSanych!

Of course on R it would be more compact, but we should probably start there first.

I'm finishing the article on "Deep" learning. With your data we will try to compare the result.

Good luck

It would be very good to repeat that thread of yours here.

It was very useful material.