Machine learning in trading: theory, models, practice and algo-trading - page 1911

 
Maxim Dmitrievsky:
You're making it up as you go along.) 24 columns, not 100. You asked for the file yourself. There are no mistakes (I explained it). 300 and a lot of lines because I gave you a year, so that your 'generator' did not poop to count.) But go on. I didn't have time to finish watching it, but the beginning is promising. I'll leave a full review later. It looks like I will have to answer in video format.
Yes Maxim sorry, it was not your training file, but the meaning of the message I think is clear. Well can't 24 columns explain 2000 vectors without repeats. It's just physically impossible....
 
Mihail Marchukajtes:

I have a dirty file of 7,700 columns where I take 24 leagues, so don't go on, but rather look here. Here's your file.

And here is mine.

What's the difference???? I'm not going to stall. In principal component analysis, when each column is its own coordinate system, it is important that they can be clustered when points from different columns can be plotted on one coordinate system common to them. The interpretation is simple. The more vertical and horizontal vectors, the cooler it is. In your case, it's a stupidly uniform spot.

I figured it out a bit, in fact the angles between the vectors show the correlation (90gr = 0 correlation). I'm feeding incremental lags, there's no correlation there, it's like white noise.

The fact that you make do with 50 training examples (50 lines) is surprising, it's how shallow the grid has to be. A lot of examples are needed to discard unnecessary functions (ideally up to one) which can describe data.

 
Mihail Marchukajtes:
Yes Maxim sorry, it was not your training file, but I think the meaning of the message is clear. Well, 24 columns can't explain 2000 vectors without repeats. It's just physically impossible....


I'll do a dance next time.

 
Maxim Dmitrievsky:

I'll do a dance next time.

vtreet doesn't use rsa, it's not about that at all. I myself did not understand what they do ,

They preprocess missing characters and so on + they create new features but don't position it as a feature induction as well as they consider the importance of features but don't position it as a feature selection, so I don't know what it is and how it works.



Regarding the "PCA is linear and what you're there ptsashil can just throw in the trash".

I bet I can get a price from a PCA decomposition on new data with +-98% accuracy.

I can prove it, so you're wrong here, maybe it's even cool that it's linear, otherwise I wouldn't collect it.

 
Maxim Dmitrievsky:


next time I'll do a dance.

In general I agree with the terminology where columns are inputs (explanatory variables), rows are training vectors or examples. It is possible that for some learning algorithms it is critical when rows are less than columns, but when rows become more than columns you get near identical examples that pull the model into the area of overlearning. It is not possible to describe 350 examples (rows) with 24 explanatory variables (columns) and not allow repetition.

I don't use PCA in general in any way, it was just an example for another user. With it you can evaluate the resulting set as far as it can be divided.


P.S. The pucker with the song counts. Well done!

 
So theoretically the coolest matrix for learning is a square matrix when the number of columns and rows are the same.... HMM... By the way, it gave me an idea to take such a number of examples for training, where after pre-processing there is the same number of columns.... That's a thought... so it is.... The square matrix has 100% no repeats....
 
Mihail Marchukajtes:

In general I agree with the terminology where columns are inputs (explanatory variables), rows are training vectors or examples. It is possible that for some learning algorithms this is critical when rows are less than columns, but when rows become more than columns you get near identical examples that pull the model into the area of overtraining. It is not possible to describe 350 examples (rows) with 24 explanatory variables (columns) and not allow repetition.

I don't use PCA in general in any way, it was just an example for another user. With it you can evaluate the resulting set as far as it can be divided.


P.S. The pucker with the song counts. Well done!

Only if the labels of the classes is very different number (classes are not balanced)

I was torturing you for nothing... the most famous magician and wizard)))

 
Mihail Marchukajtes:

You get close to similar examples, which draw the model into the area of overlearning.

these "same examples" are exactly what create statistically significant structures - what is statistics? it's when something repeats and you can draw some conclusions from it

Mihail March ukajtes :

It is not possible to describe 350 examples (rows) with 24 explanatory variables (columns) and not allow repetition.

What do you have against repetitions????

 
mytarmailS:

these "same examples" are exactly what create statistically significant structures - what is statistics? it's when something repeats and you can draw some conclusions from it

what's your problem with repetition????

There are three kinds of lies: lies, outright lies, and statistics. - Mark Twain.

Repetition leads to rote learning; we need to generalize the network. That is, we need to feed one unique vector so that when a new vector appears nearby, the network reacts adequately. If there is a group of vectors in the set, the network simply notices them....

In other words, the algorithm will assign an unreasonably high weight coefficient to these two close vectors....

 
Maxim Dmitrievsky:

only if the labels of the classes is very different number (classes are not balanced)

I've been struggling with you for nothing... the most famous magician and wizard)))

I'm originally talking about two classes and no more. If there are three or more classes then it's allowed to create a table of unique vectors with more rows than columns, but their uniqueness will be defined by target exclusively.
Reason: