Machine learning in trading: theory, models, practice and algo-trading - page 365

 
elibrarius:

Removing the correlated inputs I have already done, I just think how else to improve the inputs.

So, I agree with you that there should be correlation with the target, that is why I want to remove the most uncorrelated with the target inputs, for example with Ккорр<0,5 or 0,3. This should speed up the learning process without much degradation. But there is an assumption, that I'll have to delete all inputs )))

On the used inputs (taken random from the tech. indicators), so far I have not found any correlation with the target, the learning error = 0.44, i.e. almost a coin. Well, the balance is going down.


in no way should there be a correlation with the target, where is that written at all? what sense does that make? If you have a correlation of 1 with the target then you know the future and you don't need a neural network
 
Maxim Dmitrievsky:

in no way should there be a correlation with the target, where is that written at all? what sense does that make? If you have a correlation of 1 with the target, you know the future and you don't need a neural network


All MO is based on the fact that the input variables must correlate with the output variable.

Otherwise there is no point in ALL MO models.

In Data Mining ALL VARIABLE VARIABLE VARIABLE SELECTING MODELS the mechanism of maximum correlation of input variable and output variable is implemented:

Forward Selection procedure,
Backward Elimination procedure,
Stepwise procedure,
Best Subsets procedure.
 
Dimitri:


All MOEs are based on the fact that the input variables must correlate with the output variable.

Otherwise there is no point in ALL MO models.

In Data Mining, ALL VARIABLE VARIABLE SELECTION MODELS implement a mechanism to maximize the correlation between the input variable and the output variable:

Forward Selection procedure,
Backward Elimination procedure,
Stepwise procedure,
Best Subsets procedure (best subsets).

correlate in the sense that input and output vectors (curves) must correlate, or does correlation simply mean the dependence of the output variable on the input variable, in a general sense?
 
Maxim Dmitrievsky:

correlate in the sense of input and output vectors (curves) should correlate or does correlation simply mean the dependence of the output variable on the input variable, in a general sense?


Dependence is a special case of correlation. If the two variables are dependent, then there is definitely correlation. If there is correlation, then there is not necessarily dependence.

There are no methods for detecting dependence in statistical models. There is only hope that the identified correlation between a set of input variables and an output variable is a relationship.

Therefore, variables must correlate.

 
Dimitri:


Dependence is a special case of correlation. If two variables are dependent, then there is definitely a correlation. If there is correlation, then there is not necessarily dependence.

There are no methods for detecting dependence in statistical models. There is only hope that the identified correlation between a set of input variables and an output variable is a relationship.

Therefore, the variables must correlate.


If there is an inverse corr, is it no longer a dependence or what? ) And the NS will get stumped with this approach.

Hallelujah... zigzag at input and zigzag at output with offset... correlation is almost perfect, but what's the point? )

 
Maxim Dmitrievsky:

and if inverse corr, is it no longer a correlation or what? ) And the NS gets bogged down by this approach


Lack of correlation is when the correlation coefficient is 0.

How can you build a model if incoming and outgoing don't correlate at all?

 
Dimitri:


No correlation is when the correlation coefficient is 0.

How can you build a model if the incoming and outgoing are not correlated at all?


Yes because the correlation of inputs and outputs doesn't matter at all when the model is looking for patterns in a set of predictors... It's a contradiction to remove correlated inputs but look for correlated inputs to outputs... )) I.e. at least we will have one input correlated with the output, then we have to remove all the other inputs, since they too are correlated with the output, and consequently with the other inputs... cool, huh?
 
Maxim Dmitrievsky:

That's because the correlation of inputs and outputs is not important at all when the model is looking for patterns in a set of predictors... It's a contradiction - remove correlated inputs but look for correlated inputs to outputs... )) I.e. at least we will have one input correlated with the output, then we have to remove all the other inputs, since they too are correlated with the output, and therefore with the other inputs... cool, right?


No, not cool.

If you have the first variable correlated with the outgoing variable with a coefficient of, say, 0.7 and the second variable with a coefficient of 0.65, that does not mean that the two variables are highly correlated with each other.

Now let's imagine that the first is correlated with 0.7 and the second with the coefficient of -0.69.

 
Dmitry:


No, that's not cool.

If you have the first variable correlated with the outgoing variable with a coefficient of, say, 0.7 and the second variable with a coefficient of 0.65, that doesn't mean that the two variables are highly correlated with each other at all.

Now let's imagine that the first with a coefficient of 0.7 and the second with a coefficient of -0.69.


And if you also imagine that the correlation defines "similarity" in a very peculiar way... I wouldn't give it much credence.

We are building an accurate high-tech neural network and are guided by correlation when choosing predictors? somehow this is a little bit wrong or something... but it's all "in my opinion"... )

 
Maxim Dmitrievsky:

And if you also imagine that correlation determines "similarity" in a very peculiar way... I wouldn't trust it very much.


Then the second option - put everything you have into NS. But there are two BUTs:

1. Hope that uncorrelated variables do not degrade the quality of the model (there is such a thing for regression).

2. Sacrifice dimensionality and time.

Reason: