Machine learning in trading: theory, models, practice and algo-trading - page 3

 

You can go the way of throwing out the same inputs. For each column from each other, calculate the average deviation. In this way you can find the two most similar columns, throw one of them out. Which of the two, can be counted by the average deviation across all the columns, etc.

 
Dr.Trader:

Visually, all weights are divided into two groups. If you need to divide them according to the principle of significant/non-significant, then 5,11,7,1,3,9 clearly stand out, this set I think is enough.

This is the right decision!

I didn't know NS could do that. This is a revelation to me.

It turns out that NS can simulate interactions.

I owe you a prize. Thank you!

 

It is true that the NS do not give an idea of the type of pattern. That is, the interpretability of the model is in question.

I'll post the logic for creating the dataset later and talk about the type of dependency.

 
Dr.Trader:

Visually, all weights are divided into two groups. If you need to divide them according to the principle of significant/non-significant, then 5,11,7,1,3,9 clearly stand out, this set I think is enough.

If you are interested, try to remove any significant predictor from the dataset and re-train the NS and display a picture with weights. I think there will be a surprise.

This is already out of the assignment. To grow the topic.

 
Alexey Burnakov:

If you're interested, try removing any one significant predictor from the dataset and re-training the NS and displaying a picture of the weights. I think there will be a surprise.

This is already out of the assignment. To grow the topic.

The furnace is a very important interior detail in the sense that you should always dance from not it.

Dr.Trader.

Your example with NS only suggests that NS liked the predictors you mentioned more and didn't like the others. Algorithms that do such things are a dime a dozen.

All this would be fine if such a selection is made among predictors that are relevant, have predictive power, predictive ability for the target variable.

In my practice, there are always predictors among an arbitrary set of predictors that have no (or very weak) relation to the target variable. So the sneaky thing is that given some number of such noisy predictors or some random selection of values, most algorithms, including NS, fail to distinguish informative predictors among the noisy ones.

Therefore, "stove-piping" is defined as an action that allows to preliminarily clear the initial set of predictors from hopeless, noisy predictors and then...

PS.

I haven't worked with NS, but random forests, with a certain amount of noise predictors, tend to discard information predictors according to their built-in algorithm. As a result on the noise give extraordinary performance with an error less than 5%!

SARP

Presence of noise predictors leads obligatory to re-learning of the model with all that it implies for the real world

 
SanSanych Fomenko:

The stove is a very important part of the interior in the sense that you should always dance from it.

Dr.Trader

Your example with NS only shows that NS liked the predictors you mentioned more and didn't like others. Algorithms that do such things are a dime a dozen.

All this would be fine if such a selection is made among predictors that are relevant, have predictive power, predictive ability for the target variable.

In my practice, there are always predictors among an arbitrary set of predictors that have no (or very weak) relation to the target variable. So the sneaky thing is that given some number of such noisy predictors, or some random selection of them, most algorithms, including NS, fail to distinguish informative predictors among the noisy ones.

Therefore, "stove-piping" is defined as an action that allows to preliminarily clear the initial set of predictors from hopeless, noisy predictors and then...

PS.

I haven't worked with NS, but random forests, with a certain amount of noise predictors, tend to discard information predictors according to their built-in algorithm. As a result on the noise give extraordinary performance with an error less than 5%!

SARP

The presence of noise predictors necessarily leads to retraining of the model with all that implies for the real

The NS did very well.

Random Forest could not handle such a task, where the interaction of a set of variables. And the individual significance of each predictor was intentionally zero.

 

Glad it worked out :) , thanks for the prize.

I tried removing one input (4 cases) - if I remove input_5 or input_9 then nothing else works, neuronka with the same configuration does not even learn until the error is less than 50%, starts to just output in a larger case or 0 or 1.

If you remove input_20, everything is fine, the result is correct. But funny thing about input_15, if you remove it then neuron doesn't even train properly, the problems are the same as when you remove input_5 or input_9. I didn't test it further.

I attached file with R code for training neuronc, if you are interested. It's basically just a slightly modified code from Rattle's log.

Files:
r_nnet.zip  3 kb
 
Dr.Trader:

Glad it worked out :) , thanks for the prize.

I tried removing one input (4 cases) - if I remove input_5 or input_9 then nothing else works, neuronka with the same configuration does not even learn until the error is less than 50%, starts to just output in a larger case or 0 or 1.

If you remove input_20, everything is fine, the result is correct. But funny thing about input_15, if you remove it then neuron doesn't even train properly, the problems are the same as when you remove input_5 or input_9. I didn't test it further.

I attached file with R code for training neuronc, if you are interested. It's basically just a slightly modified code from Rattle's log.

Message me with your card or e-wallet number.
 

Well, in general, the process is clear, the neuron simply tries to fit the available data into some logic, and if some of the inputs do not carry new information, it minimizes their impact so that they do not harm. It is unlikely to find complex correlations of inputs, I agree.

Also, the nnet package in R is not exactly a normal neuron. From the description, it should use second-order learning. Usually in neurons the weights change according to the derivatives, but here they change according to the derivatives of the derivatives. And during training a kind of "hessian" matrix is built, which stores important data about all weights for all training examples at once. They say it's very cool, this package must be strong.https://ru.wikipedia.org/wiki/Алгоритм_Бройдена_-_Флетчера_-_Гольдфарба_-_Шанно - I didn't understand it, but that's if someone is a mathematician then will figure it out.

Алгоритм Бройдена — Флетчера — Гольдфарба — Шанно — Википедия
  • ru.wikipedia.org
Алгоритм Бройдена — Флетчера — Гольдфарба — Шанно (BFGS) (англ. Broyden — Fletcher — Goldfarb — Shanno algorithm) — итерационный метод численной оптимизации, предназначенный для нахождения локального максимума/минимума нелинейного функционала без ограничений. BFGS — один из наиболее широко применяемых квазиньютоновских методов. В...
 
About the strange results:
If you remove any significant predictor, nothing will work. This is the interaction.

Each of the predictors says nothing about the state of output. So algorithms that consider individual significance won't work. Also decision trees and random forests will almost certainly not work, since they also consider predictors individually. But a huge forest of tens of thousands of trees can accidentally connect significant predictors in one branch and then everything will work. But it's unlikely.

Why?

Interaction is information flowing from many predictors together to the output. The dependency algorithm is such that the sum of significant predictors can be 50/50 even or odd. If it is even the output is 1. Otherwise it is 0. That is why removing at least one of the significant predictors breaks the dependence. And adding extra predictors can make it so noisy that the statistic won't show significance.

I'm really surprised that conventional NS was able to detect such a relationship. Now I'm starting to believe in MLP's ability to detect meaningful inputs. Yay.

All in all, you are right on the money. If you try to train a random forest it almost certainly won't work.

I'm also pretty sure that logistic regression will fail.

In short, you need stochastic enumeration of different subsets of predictors with the right fitness function for this problem. Or NS )))

I'll post my method later.

Maybe someone else will try another way to select predictors and then we can compare results.

Reason: