Machine learning in trading: theory, models, practice and algo-trading - page 1777

 

Somewhere heard such a clever idea, something like - if there are signs with at least some statistical significance with even the most minimal combining them together you can get accuracy close to 100%.

I decided to check it out...

I made a synthetic date with binary target, to each value of target I connected a chip with some hit probability.

I made 10 of these chips with probabilities of 51:49 for one target and 49:51 for another.

I trained Forest.

got new data.

 Accuracy : 0.5145   

there are not 10 features, but 100

 Accuracy : 0.534 

I got 1000 signs.

Accuracy : 0.558 

The conclusion is: we must improve the quality of the signs, we can't go far with quantity...

let's try to increase the probability by, say, 55:45.

10 traits give...

Accuracy : 0.6055 

100 traits give

Accuracy : 0.7985    

let's try another increase in the probability : 60:40

10 traits

Accuracy : 0.729 

100 signs

 Accuracy : 0.968 


So it turns out that to live in Sochi on each candle must have 100 rules/features/AMO that give 60% correct answers... and still have to be different at the same time... I wonder if it is possible to do that?

 
mytarmailS:

Somewhere heard such a clever idea, something like - if there are signs with at least some statistical significance with even the most minimal combining them together you can get accuracy close to 100%.

I decided to check it...


Correlation of EURCAD on daily data with pairs:

AUDCHF CADCHF CHFJPY EURCHF
EURCAD -0.22 -0.33 -0.39 0.37


Coefficient of determination of the simplestlinear regression EURCAD = a*AUDCHF + b*CADCHF + c*CHFJPY + d*EURCHF + k

R^2 = 0.99622555

 
Dmitry:

EURCAD correlation on daily data with pairs

Correlation is not a prediction but a measure. Or am I missing the point?

 
mytarmailS:

correlation is not a prediction, but a measure. Or am I missing the point?

Correlation shows the statistical significance of each variable - it is low.

Together they form a model that explains the dynamics of the dependent variable by 99.6%

 
Dmitry:

The correlation shows the statistical significance of each variable - it is low.

Together they form a model that explains the dynamics of the dependent variable by 99.6%.

well yes, but it explains not predicts, correlation is just a measure of the relationship between variables, what is the conclusion of your thought? i still don't get it (

If we are looking for a cross correlation between pairs
 
mytarmailS:

Well yes, but it explains not predicts, correlation is just a measure of the relationship between variables, what is the conclusion of your thought? I do not understand (

"I've heard this clever idea somewhere, something like - if there are signswith any statistical significance, even with the most minimal one, connecting them together you can get accuracy close to 100%" (c)

Correlation shows the statistical significance of the independent variables for predicting the dependent in a linear regression model.

 
Dmitry:

"Somewhere I heard such a clever idea, something like - if there are signs with at least some statistical significance, even with the most minimal one by combining them together you can get accuracy close to 100%" (c)

I meant characteristics that somehow can predict, and not just a correlation

 
mytarmailS:

I meant signs that can predict in some way, not just correlation.

And the ability to predict is determined by how?

 
mytarmailS:

I meant attributes that can somehow predict, not just correlations.

Here we have a dependent variable and a set of possible independent variables.

How is "predictive ability" determined?

By dumbly shoving everything in the world into the model?

 
mytarmailS:

Well... Nice and believable. I would like to see the balance of the trade and a chart with entries.

You never told me how to trade on it - that's why I don't know what kind of TS I should make up.

mytarmailS:

As far as I understand it is an ensemble of 10 models. How do the models differ from each other?

No, it's just 10 models to see the spread, the only difference is the seed, i.e. the random value to start learning (used in evaluating splits and selecting them).

Reason: