Machine learning in trading: theory, models, practice and algo-trading - page 3322

 
Maxim Dmitrievsky #:
Nobody discussed anything, you interfered in a conversation about model complexity or reacted to a post, so you are known as an optimiser.

That's it, Maxim. There is no point in further discussion, everyone has already found out everything. Sanych is watching here, who has been arguing with me for centuries that it is harmful to search for a global, to which I replied "depending on what kind of global", and Sanych has not yet understood that he is looking for a global in the end.

I hope to return the discussion to a calm, friendly and constructive direction.

And everyone who reads this thread on a regular basis remember who and what and when said, do not include unconsciousness - it is useless. I made a mistake today - I drew the chart wrong, I calmly admitted the mistake and said so. It is normal to admit your mistakes, Maxim.

 
Andrey Dik #:

Why not? Yeah. I don't have a different idea, it's just that a lot of people don't like to call things by their proper names for some reason.

When exactly to stop learning is a matter of methodology choice, I was only emphasising that it is impossible to stop learning without detecting a global extremum (well, or forcibly, which will have no practical use).

I just see a misunderstanding of opponents - earlier there was a conversation about optimisation, and that overfitting in a tester and training a model are different things. It's just that in machine learning the process of predictor selection is usually a separate task, while in the terminal tester you can often change the readings of these predictors - a simple example here is the search of indicator settings, but at the same time in the tester you can use variables to process the values of these indicators - moving thresholds, comparison rules, coefficients with different logical meaning.

Therefore, without seeing the code of the Expert Advisor, one cannot say unambiguously what the optimisation process is like - creation of rules or search for predictors that work better with fixed rules, or one and the other part is implemented. Hence the whole argument, in my opinion, on that topic.

Considering that overfitting of anything and everything is generally applied, Maxim argues that this is not learning with a teacher, as the markup logic may be missing and the situations are not similar to each other, which complicates the learning process. And rightly so, in essence we have many different classes that are grouped together based on likes and dislikes, but they may not be similar in attributes to each other. Earlier I suggested a method of gradually weeding out such data, through incremental learning, isolating the absence of contradictions in the initial sample using a trained model and then training on the remaining data. This is not the only solution. The topic really deserves attention.

About the stopping criterion - here, of course, you can choose your criterion and search for its optimal value, depending on what is a higher priority in the model's responses. However, it does not necessarily have to be a criterion in machine learning - sometimes you can just set a fixed number of iterations - i.e. you can train without this criterion. The question is always a different one, how to choose a model that will work on new data. It is better to look for criteria that will answer this question.

 
Andrey Dik #:


Thus, learning in machine learning can be seen as an optimisation, where the goal is to find the combination of model parameters that minimises the loss function and achieves the best model performance.
I disagree. This is only a small part of MOE. Transformers, causal learning in mo is definitely not optimisation in general.
 
Andrey Dik #:

Bingo!

Now you have finally realised that any learning is nothing but optimisation with search for a global extremum. Or maybe you haven't realised it yet, but you will.

It cannot be otherwise, you always need an unambiguous criterion to stop learning and this criterion is always designed so that it is a global extremum. Usually an integral criterion is designed (not always). You called integral criteria.

I always thought that searching for extrema in a function is functional analysis, just like developing an algorithm can be called optimisation. After all, we choose the best one according to some criteria)
 
Oh these terms and their interpretation))))))
It's like some kind of sacred business)))))
 

The problem of neuroncs in Python is beautiful pictures of statistics on a macro scale. As soon as you start parsing them, it turns out that the neuron becomes a moving average - it is close to the prediction price, but it does not guess the direction. The direction is guessed 50/50, so it is impossible to trade on these neurons.

I tried the articles here, and with ChatGPT we collected 10 000 neurons in 3 layers, or 10 layers of 1000 neurons, or one layer of 100 000 neurons (my RTX 3080 was completely clogged, and if you take more - python wrote that there is not enough memory), and RNN, and LSTM, and CNN, and CNN-LSTM, and CNN-BiLSTM, and CNN-BiLSTM-MLP with two regulation and dropouts, and Q-learning. DQN only failed, Chat wrote Actor-Critic for a few pages, but the code turned out to have bugs that neither I nor Chat could fix.

ALL of it - doesn't work. The result is the same everywhere - turning into a moving average. I input everything possible from the "what to input to the neural network" thread, and many other things.

And here I am cutting onions in the kitchen, with YouTube playing in the background and recommendations being given to some curly-haired guy who is about to build a neural network that predicts prices. Okay, come on, I say to myself, try it.

And so he opens Google-collab and starts writing Python code there (I think it's Python). Throws in there the closing prices, if I'm not mistaken - of the day, of bitcoin. Trains it. Checks it. Here I wiped my eyes from the onion and started to look at the result.

The result is as follows: the prediction goes next to the actual price, but... with a directional guess. That is, let's say it was 35000, prediction is 37500, fact is 37100. Next step: 35700 forecast, 35300 actual. Forecast 34000, fact - 35000. And so on and so forth. He wrote a network that predicts not the next price, but 12, I think, next prices in a row at a time. And here they coincided after each step in the direction.


Question: is it really possible to write something that works in Python?

 
Andrey Dik #:

That's it, Maxim. Further discussion has no sense, all everything has been found out already. Here here near and Sanych observes, which from time immemorial argued with me that, say, global search is harmful, to which I replied "depending on what global", and Sanych also has not yet realised that in the end is looking for a global.

I hope to return the discussion to a calm, friendly and constructive direction.

And everyone who reads this thread on a regular basis remember who and what and when said, do not include unconsciousness - it is useless. I made a mistake today - I drew the chart wrong, I calmly admitted the mistake and said so. It is normal to admit your mistakes, Maxim.

Don't do it for me.

We were discussing the tester, not the MO.

In MO, they are not looking for an optimum, but for a coincidence of an error on traine, testing, and validation. Then they run it step by step on one more file. Everywhere there should be approximately the same error.

There is no room for optimisation here.

 
СанСаныч Фоменко #:

Don't do it for me.

The discussion was about the tester, not the MO.

In MO, they are not looking for an optimum, but for a coincidence of an error on traine, testing, and validation. And then run it step by step on one more file. There should be approximately the same error everywhere.

There is no room for optimisation here.

Red is optimisation, in your case it will sound like "reducing the variance" or "setting a fixed variance of a given value" depending on what you are doing)))

 
Andrey Dik #:

Red is optimisation, in your case it would sound like "reducing the variance" or "setting a fixed variance of a given value" depending on what you are doing)))

Somehow you are not perceiving the text.

Optimisation(?) is ONLY possible on the traine on which the model is trained. When training there is an algorithm and you can see the optimisation with a magnifying glass. Then there is a test of this model, which has NO feedback with "optimisation", because there are no parameters in the model that can affect the results on testing. To affect you have to do something about the predictors and/or the teacher.

In MOE, meaning preprocessing, model fitting and model estimation there can NOT be optimisation because the resulting error is a property of these three steps.

 
СанСаныч Фоменко #:

You don't understand the text.

Optimisation(?) is possible ONLY on the traine on which the model is trained. When training, there is an algorithm and you can see the optimisation with a magnifying glass. Then there is a test of this model, which has NO feedback with "optimisation", because there are no parameters in the model that can affect the results on testing. You have to do something about the predictors and/or the teacher to affect it.

In MOE, meaning preprocessing, model fitting and model estimation there can NOT be optimisation because the resulting error is a property of these three steps.

Somehow you are not perceiving the text.

Did you read what was written above? Do you want a random result or a better result? Whatever you do - you are optimising, combining methods, shmethods, pre, pro, postprocessing shmaccessing, all to achieve the best result.

Calling a lemon sweet doesn't make it sweeter (or whatever the saying is). Already so and so, and so and so, and so, but no! - Sanych does not do optimisation, not a single thing.

I've written too much about it today, that's enough.