Machine learning in trading: theory, models, practice and algo-trading - page 254

 
Dr.Trader:

It looks complicated and I'm not sure of the result, I'll pass.
R has a GMDH package ("MGUA" in English).

It's complicated because there are all kinds of algorithms described and it's easy to get confused.

There's a package, but it's "dull".

I'll try to explain in simpler and shorter terms...

It's not a physical self-organizing combinatorial model.

Simply a simple search of all possible combinations of elements of sample (predictors) and check of each iteration of this search at OOS, since predictors are polynomials or harmonics, such combinations can be complicated, so to go as if to a new level and make combinations with combinations and check at OOS and again to a new level and so on until minimum error at OOS is found, this is an essence of self-organization (if I understood everything correctly there)

What I suggest....

We all know that any function can be decomposed into a Fourier series, as in Fig.

ч

The opposite is also true, that with the right combinations of harmonics, we can get any function we need

Now imagine that there is a function (a curve) or a vector, whatever you want to call it, that completely describes a market instrument and outperforms it ... It may be a smart seasonality expressed in a function or dependence on other instruments, or lunar cycles )) it is absolutely unimportant and we mortals do not know it BUT we can find this dependence without even knowing where to look for it ... Let's call this dependence which moves the market - super dependence

All that is needed is to

1) take a certain adequate range of harmonics

2) create a target one

3) start going through all possible combinations between harmonics and look at the OOS

Do you understand how simple and deep it is, we don't know where to look for the SP, but we can generate it synthetically by ourselves!!, and we know it is the SP through the OOS.

That's it, seemingly simple oversampling algorithm, but what a powerful sense is behind it, I was just amazed when I realized ...

But there is a big BUT! There will be trillions of combinations, we need a way around it, and I need your help in this

Maybe genetics is a way to look for good combinations?

What are your thoughts on the idea and its implementation?

BREAD!!! ......

 

What you described is fitting the result to "good on OOS". First, you select the model parameters (harmonic combinations) such that there is a good result on the sample. Then you select from them those parameters which are good on oos. This is a double fitting of good results and the model will most likely be useless on new trading data.

With combinations of harmonics you can describe price movement, but the same can be achieved with neuronics or forest. I think you already did it hundreds of times with neuronics and it didn't work, why do you think the result of MSUA will be positive? Machine learning in forex cannot be used as easily as in ordinary problems, price behavior changes over time, and most of dependencies that you will find and use - will cease to exist very quickly. And all these models are designed for constant dependencies that don't change over time, and because of that don't make a profit in such a problem.

 
mytarmailS:

It's complicated because there are all kinds of algorithms described and it's easy to get confused.

There's a package, but it's "dull".

I'll try to explain in simpler and shorter terms...

It's not a physical self-organizing combinatorial model.

Simply a simple search of all possible combinations of elements of sample (predictors) and check of each iteration of this search at OOS, since predictors are polynomials or harmonics, such combinations can be complicated, so to go as if to a new level and do combinations with combinations and again check at OOS and again to a new level and so on until minimum error at OOS is found, this is an essence of self-organization (if I understood everything correctly there)

What I suggest....

We all know that any function can be decomposed into a Fourier series, as in Fig.

The opposite is also true, that with the right combinations of harmonics , we can get any function we need

Now imagine that there is a function (a curve) or a vector, whatever you want to call it ... that completely describes a market instrument and outperforms it ... It may be a smart seasonality expressed in a function or dependence on other instruments, or lunar cycles )) it is absolutely unimportant and we mortals do not know it BUT we can find this dependence without even knowing where to look for it ... Let's call this dependence which moves the market - super dependence

All that is needed is to

1) take a certain adequate range of harmonics

2) create a target one

3) start going through all possible combinations between harmonics and look at the OOS

Do you understand how simple and deep it is, we don't know where to look for the SP, but we can generate it synthetically by ourselves!!, and we know it is the SP through the OOS.

That's it, seemingly simple oversampling algorithm, but what a powerful sense is behind it, I was just amazed when I realized ...

But there is a big BUT! There will be trillions of combinations, we need a way around it, and I need your help in this

Maybe genetics is a way to look for good combinations?

What are your thoughts on the idea and its implementation

I have realized something like this in my indicator. It is very resource consuming. I can adjust it on 1000 bars history, or I can adjust it on 10000 bars history. The bottom line is the following: sometimes it works perfectly and predicts a point in the price. But at other times it is not even close. The reason is that all these periods start floating in the market after the calculated point. Probably it is possible to introduce corrections -+ from the current values, but it did not work for me.
 
Dr.Trader:

What you described is fitting the result to "good on OOS". First, you select the model parameters (harmonic combinations) such that there is a good result on the sample. Then you select from them those parameters which are good on oos. This is a double fit to a good result, and on new trading data the model will most likely be useless...

I slept with the idea and realized myself that it's a bullshit, it's just a fitting...

But the question is, why is not fitting everything else? After all, then it turns out that any training with a check for OOS is a fitting for OOS, right? If not, why not?

Dr.Trader:

Why do you think that the result of MGUA will be positive?

I myself do not understand this at all yet, maybe I never will...

Just in private correspondence Nicholai recommended me - if you want to start building something useful, get acquainted with MSUA and study spectral analysis.

He said that he himself started with it, and when he built his first working robot he used only two books, one by Ivakhnenok "MSIA" and the other by Marple "spectrum analysis".

the translation of the book title is wrong, I just summarized it to make it clear what he was talking about in general

Who is Nicholas? In addition to being a modest and very smart man.

He is Ph.D. in artificial intelligence, has been building robots for about 20 years, his latest robots look like this:

я

That is, the man was spinning these networks as he wanted 30 years ago, at a time when we didn't know such words as "neural network" yet ......

And now this man recommends only two things Fourier and magua, common sense says that it is worth listening... Here's kind of my reasons why it should work))

 
Maxim Romanov:
This is roughly the point I was implementing in the indicator. It is very resource-intensive. It can be adjusted on 1000-bar history, or it can be adjusted on 10000-bar history. The bottom line is the following: sometimes it works perfectly and predicts a point in the price. But at other times it is not even close. The reason is that all these periods start floating in the market after the calculated point. Probably it is possible to introduce corrections -+ from current values, but I was not able to do it.
I agree with the fact that I was wrong...
 

question

there is a vector "x" and a matrix "y"

we need to quickly calculate the Euclidean distance between "x" and each row of the matrix "y"

I have overtaken the standard function "dist()" by writing my own function

штатная

system.time(for(i in 1:nrow(m)) {dist.ve[i] <- dist(rbind(x,m[i,]))})

   user  system elapsed
   4.38    0.00    4.39

самописная

system.time(for(i in 1:nrow(m)) {dist.ve[i] <- euc.dist(x,m[i,])})
   user  system elapsed
   0.65    0.00    0.67

but this is not enough, I would like to accelerate to the second zero 0,0....

what else can you come up with?

code:

x <- rnorm(10)
m <- matrix(data = rnorm(1000000),ncol = 10)

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))

dist.ve <- rep(0,nrow(m)) # distance vector
system.time(for(i in 1:nrow(m)) {dist.ve[i] <- dist(rbind(x,m[i,]))})
system.time(for(i in 1:nrow(m)) {dist.ve[i] <- euc.dist(x,m[i,])})
 
mytarmailS:

But here's the question, then why isn't fitting everything else? Because then it turns out that any training with a test on OOS is a fit to OOS, right? if not right then why not?

Crossvalidation and oos test on stationary data (having constant unchanging dependencies) are useful. On non-stationary data, it is useless.

You can, for example, train a bunch of models using the same algorithm, but on data from different times; for each model, find profits on sample and oos and find the correlation for those two profits. For commonly known models in forex, there is usually no such correlation, i.e. profit on sample does not guarantee anything, in which case it is useless to do crossvalidation to try to improve results on new data.
On the other hand, if the correlation is high and positive, then the model has some potential and you can easily do crossvalidation to fit the model parameters and improve the result.

 

Here is an example for the previous post. We take eurusd open prices for a couple of months, train a model (randomForest) on them, and use them to predict on a small section of new data. The target value is the price increment for the next bar (two classes 0 and 1). This is all repeated 1000 times for different time intervals, then the correlation is found.

At the end we see the correlation of results on the training and new data, in this case we got about 0.1, i.e. something should be changed in training, this approach will not bring profit. A good result on the training data does not guarantee a good result in the future.

In the TrainModel function you can substitute training of your models, there are also crossvalidation, selection of parameters by genetics, etc.

Files:
 
Dr.Trader:

Here is an example for the previous post. We take eurusd open prices for a couple of months, train a model (randomForest) on them, and use them to predict on a small section of new data. The target value is the price increment for the next bar (two classes 0 and 1). This is all repeated 1000 times for different time intervals, then the correlation is found.

At the end we see the correlation of results on the training and new data, in this case we got about 0.1, i.e. something should be changed in training, this approach will not bring profit. A good result on the training data does not guarantee a good result in the future.

In the TrainModel function you can use your models' training, do crossvalidation, selection of parameters by genetics, etc.

1. Why do you normalize manually?

2. Why is -1 and 1 good by correlation? only if 1 is good -1 is very bad, if i understand the idea correctly, -1 is the inverse correlation

3. Have you tried to monitor the error of a trained model in a sliding window, and if it does not suit you, retrain the model and see what will happen

4. and a global thought, why is it so bad, the market is not stationary, you need to work out some other concept for the formation of features, maybe completely move to the paradigm of logical rules, i think you need to move away from numbers almost completely, or study spectral analysis)))))

 

Yesterday I dug through the bags in search of the right one to finish one idea, I did not find the right one, but I found one interesting ...

A package called "trend."

https://cran.r-project.org/web/packages/trend/trend.pdf

the package implements different trend tests and other stuff

For example, the function

mk.test() - gives the trend characteristics, e.g. up or down, etc...

pettett.test() - as I understand it, it finds the point in the vector where the trend started

Sens.slope() - with this function, we can calculate the slope angles of the trend

and then there are all sorts of other useful features

I think, as long as it is possible to look at the trend scientifically, it should be checked) I calculated mk.test()$Zg in a sliding window of 200 values, according to clowes, and it turned out something like an indicator

Above zero the trend is up, below zero the trend is down

й

And what? it catches trends and does not confuse the price direction like macd-ki and stochastc-ki.... In general, I like the fact that they always open positions in the direction of the trend...

On some periods it even earns good money,

с

If they do not take M5, they may be something useful.

I just have a computer running at full speed, it's running in four days and the laptop I'm using now will only let me sit on it and look at web sites).

Reason: