Machine learning in trading: theory, models, practice and algo-trading - page 3082

 
Maxim Dmitrievsky #:
I'm very clear about what I'm writing, otherwise I wouldn't write it. You don't. Stop floundering, you're annoying.
.
Study the material, then we'll discuss it. If you can't - I won't be upset. To chew it up and put it in your mouth is for other people.

Maxim, I threw the translation earlier that I managed to get. From it, frankly speaking, I came to similar conclusions as SanSanych Fomenko. I admit that it is a distorted translation, as a lot of things there just sounds strange then they are sampling treatment, then fitting of indicators ....

That's why I suggest you to explain what no one has understood, in your own words, at least in theses words. Maybe after that I will perceive the written information differently.

Here is an excerpt from the translation, is everything clear?


 
СанСаныч Фоменко #:

It's not in the article.

The usual fitting with different division of the original predictors, including cross validation, is described. A routine that has been camouflaged with words.

I have a question for machine learning connoisseurs. If I use one character's data for training, another character's data for validation, and a third character's data for testing, is this a good practice?

Also, I am getting the following results from the test data: green cells are very good, yellow cells are good, red cells are average.


And also a question about modifying the data to train the model. I noticed that the model has a hard time finding extrema, in my case values above 60 and values below 40.
So I find values above 60 and below 40 in the training data, which I additionally re-add to the training data before feeding it into the model, so the question is: can I improve the accuracy of the model by increasing the training data containing information about the extrema?

inputs_unique, indices = np.unique(inputs, axis=0, return_index=True)
outputs_unique = outputs[indices]

#  Find indices where outputs_unique values are greater than 60
indices_greater_than_60 = np.where(outputs_unique > 0.6)

#  Get the corresponding inputs_unique and outputs_unique values
filtered_inputs_greater = inputs_unique[indices_greater_than_60]
filtered_outputs_greater = outputs_unique[indices_greater_than_60]

#  Add filtered_inputs values to inputs_unique
inputs_unique = np.concatenate((inputs_unique, filtered_inputs_greater), axis=0)
#  Add filtered_outputs values to outputs_unique
outputs_unique = np.concatenate((outputs_unique, filtered_outputs_greater), axis=0)

#  Find indices where outputs_unique values are smaller than 40
indices_smaller_than_40 = np.where(outputs_unique < 0.4)

#  Get the corresponding inputs_unique and outputs_unique values
filtered_inputs_smaller = inputs_unique[indices_smaller_than_40]
filtered_outputs_smaller = outputs_unique[indices_smaller_than_40]

#  Add filtered_inputs values to inputs_unique
inputs_unique = np.concatenate((inputs_unique, filtered_inputs_smaller), axis=0)
#  Add filtered_outputs values to outputs_unique
outputs_unique = np.concatenate((outputs_unique, filtered_outputs_smaller), axis=0)
 
Chapter 1 Introduction | An R Companion for Introduction to Data Mining
  • Michael Hahsler
  • mhahsler.github.io
1.1 Used Software This companion book assumes that you have R and RStudio Desktop installed and that you are familiar with the basics of R, how to run R code and install packages. If you are new...
 

 
Lilita Bogachkova #:

I have a question for machine learning experts. If I use one character's data for training, another character's data for validation and a third character's data for testing, is this a good practice?

briefly NO.

you train the model to recognise watermelons, test on apples, validate on...

 
Lilita Bogachkova #:

I have a question for machine learning experts. If I use one character's data for training, another character's data for validation and a third character's data for testing, is this a good practice?

Try using the same symbol with noise added.

 
Rorschach #:

Try using the same character with added noise.

I think it's better to shift the time if it's not ticks,

noise distorts the data, and noise has parameters and it is not clear which ones to choose, and in general, why not make quotes from noise as I did recently?

 
Maxim Dmitrievsky #:

To turn nuisance parameters into functions, you can use the output values of RF or any basic algorithm, as in the article. For the completely uninformed: replace the values of the selected parameters with function values. Then linear regression (or any other algorithm) will be the meta lerner through which the tritment effect is evaluated. Why and how all this works - learn the math.

To understand it, it is enough to start thinking with your head. But Sanych will start to make nonsense again, because he only wants to say something without thinking. Sanych, your lack of understanding is so great that you cite RF parameters as some kind of proof, which is absolutely unbelievable. I wrote you 3 times - forget about RF. For the last time: study the topic, then rant. Otherwise the same unlearned people blindly believe you.

And do not respond to my posts with the aplomb of a know-it-all (which is annoying), because you know nothing, and it looks like the ravings of a ptuschnik.

All references to sources are given in the article. Do you need to be poked at every word like blind kittens? Or are you adults after all?

You're the one quoting something else....

We trained the model well, took spontaneous predictors and replaced them with the values predicted by the model, then we train the model again. We compare the result via RMSE for regression models/data. If the result improved, then the replaced predictors changed their properties during the training period, or what?

 
Lilita Bogachkova #:

I have a question for machine learning experts. If I use one character's data for training, another character's data for validation and a third character's data for testing, is this a good practice?

Also, I get the following results from the test data: green cells are very good, yellow cells are good, red cells are average.

I'm not an expert, but I'll share my thoughts.

Few people manage to get a model that runs successfully on different characters. So it can be considered a good achievement if that is indeed the case. I assume that the model sees patterns that are probabilistically equally realised.

Lilita Bogachkova #:

And also a question about modifying the data to train the model. I noticed that the model has a hard time finding extrema, in my case values above 60 and values below 40.
So I find values above 60 and below 40 in the training data, which I additionally re-add to the training data before feeding it into the model, so the question is: is it possible to improve the accuracy of the model by increasing the training data containing information about extrema?

If you add more samples - the model may find a unifying pattern in them, if it is there in the prism of predictors used.

 
Maxim Dmitrievsky #:

To turn nuisance parameters into functions, you can use the output values of RF or any basic algorithm, as in the article. For the completely uninformed: replace the values of the selected parameters with function values. Then linear regression (or any other algorithm) will be the meta lerner through which the tritment effect is evaluated. Why and how all this works - learn the math.

To understand it, it is enough to start thinking with your head. But Sanych will start to make nonsense again, because he only wants to say something without thinking. Sanych, your lack of understanding is so great that you cite RF parameters as some kind of proof, which is absolutely unbelievable. I wrote you 3 times - forget about RF. For the last time: study the topic, then rant. Otherwise the same unlearned people blindly believe you.

And do not respond to my posts with the aplomb of a know-it-all (which is annoying), because you know nothing, and it looks like the ravings of a ptuschnik.

All references to sources are given in the article. Do you need to be poked at every word like blind kittens? Or are you adults after all?

I was discussing the article, not your figure in your pocket, which, I believe, there is a lot of according to the list of literature you refer to.

If you have such a burning desire to continue the discussion of the article you posted, then I am ready to continue, but: only the article and only my arguments, and in a form that excludes prohibitive rudeness on your part.


The article discussed RF. I did not see any other functions that would calculate the fitting error as well as the fitting itself. So, please be kind enough to take the text of the article and provide a specific quote that refutes this idea.

Reason: