Machine learning in trading: theory, models, practice and algo-trading - page 3312

 
СанСаныч Фоменко #:

Let me clarify my point.

Any MO algorithm tries to reduce the error. Error reduction is more effective on rubbish, because "convenient" values for error reduction are much more common in garbage. As a result, it is certain that the "importance" of predictors for rubbish will be higher than for NOT rubbish. That is why there is preprocessing, which is much more labour-intensive than the actual model fitting.

In this artificial example we are considering (not market data),
the U-axis chit is not rubbish and is very good at defining classes. The X-axis fiche is rubbish because the two classes are mixed about evenly.

The tree will easily split the data in 1 and 2 examples from the picture by just 1 split through Y=0.5 with absolute purity of classes, i.e. class probability =100%. When testing the split on the X axis, the purity will be about 50%, the algorithm will select the cleaner split, over Y. I.e. your statement that a rubbish split over X will be selected is incorrect in these examples.

The 3rd example is more complicated. The leaf with U<0.2 will be selected by the algorithm, since the purity of the class = 100%, the leaf U>0.8 will also be selected.
The leaf from 0.2 to 0.8 has a purity of about 50%, i.e. it is about as rubbish as any X-axis split.
Further splitting makes no sense, as you won't use leaves with a 50% class probability.
If you do something stupid and divide this rubbish part up to 1 example in a leaf, then splits on both Y and X will be used. Well, if we have 1 example in a leaf, then its purity of course = 100%. But these are not representative sheets. Only beginners will do that.

The 3 first sheets are enough, or you can stop dividing leaves at least by 1-5-10% of the total number of examples in the sheet. And in this example use leaves with purity for example >90%, and these will be the first 2 leaves: U<0.2 and U>0.8. The rest of the leaves will be 50% +-10% due to uneven mixing.


 
Renat Akhtyamov #:

... I would try charging my indicator to neuronics as predictors and try to identify the signs of rubbish and scavenger

And what's stopping you from trying?

 
Andrey Dik #:

No one knows what's rubbish and what isn't, they're hypotheticals.

If we knew exactly what is what, there wouldn't be a 3K page thread)).

One simply makes an assumption that going beyond such and such limits is "rubbish", these limits are also hypothetical. That's why the expression "rubbish in - rubbish out" is nothing more than a beautiful phrase, what is rubbish for one researcher is non-m rubbish for another researcher. It is like Eliot's waves.

There is no need to sign for everyone.

Most likely you do not know which examples are "rubbish" and which are not. For you it is a hypothetical concept. If you knew what is what, you wouldn't be sitting in this thread and writing deep-thought generalisations for everyone.

When will you learn the basics of the MoD? It's a rhetorical question.

 
Vladimir Perervenko #:

You don't have to sign for everyone.

Most likely you do not know which examples are "rubbish" and which are not. For you it is a hypothetical concept. If you knew what is what, you wouldn't be sitting in this thread and writing profound generalisations for everyone.

When will you learn the basics of the MoD? It's a rhetorical question.


Your post does not show that you know what is rubbish and what is not.
Besides, that's the funny thing, if you know what is not rubbish, then there is no need for MO.

That's the purpose and goal of the IO - to separate the flies from the cutlets.

If you know, what are you doing here?

 

In physics, the signals that affect the signal we need are usually considered rubbish. Any signal, any action is caused by something, it is called rubbish because it is not necessary and does not give a correct assessment of the signal needed by the researcher. And so, well, in nature there is no rubbish))))))))))

Here, when looking for patterns of price, inefficiency or something else, the signal for evaluation is the impact of some real events or their totality on the price. And all other influences will be rubbish.

Not claiming the truth of judgement of course))))

 
Valeriy Yastremskiy #:

In physics, the signals that affect the signal we need are usually considered rubbish. Any signal, any action is caused by something, it is called rubbish because it is not necessary and does not give a correct assessment of the signal needed by the researcher. And so, well, in nature there is no rubbish))))))

Here, when looking for patterns of price, inefficiency or something else, the signal for evaluation is the impact of some real events or their totality on the price. But all other influences will be rubbish.

Not claiming the truth of judgement of course))))

If we dive into DSP theory, it goes like this:

a useful signal without rubbish is initially known (e.g. a trend line or some curve)

then, on the next tick, the useful signal is subtracted from the total mass of signals, and the signals that are not needed, i.e. rubbish, are identified.

 
Feature littering is evaluated in relation to specific target features and vice versa. If there is no cause-and-effect relationship, the dataset is rubbish in its entirety, or one of its components. And often it's not features, but incorrect markup.

Because even rubbish can be partitioned in such a way that it is useful. For example, sort by type or size.
 
Ivan Butko #:

Can you please tell me what is not rubbish? I have never seen anyone talking about clean input data. But I hear about rubbish on the forum all the time.

What are they? If you are talking about rubbish, then you have not had rubbish, otherwise there is nothing to compare it to

NOT rubbish is a predictor that is related to/influenced by a teacher. Here's a proxy package full of algorithms to separate rubbish from NOT rubbish. By the way, far from being the only one in R.

For example, mashka for teacher price increment is rubbish, as are any smoothing algorithms.

proxy: Distance and Similarity Measures
proxy: Distance and Similarity Measures
  • cran.r-project.org
Provides an extensible framework for the efficient calculation of auto- and cross-proximities, along with implementations of the most popular ones.
 
mytarmailS #:
Preprocessing is about normalisation, not rubbish.
Debris is feature selection and partly feature engineering

Sanych, stop feeding rubbish to people who are immature.

If you mean feature selection as part of models, I completely disagree, because feature selection as part of models ranks just any rubbish.

 
Sanych, when will we remember that Teacher is signs + target)?

Such childish bloopers, as if from serious people who have learnt the great R, spoil the whole atmosphere. And people, it turns out, are unlearned, no matter how much you correct them.

How can I talk to you if you are still confused about the basics?)

No offence, but you do not even understand each other, what each writes about :))))))
Reason: