How to train a model to predict S P500 values from 1960 to 1999 inclusive - Articles, Library comments

Discussion of article "Evaluation and selection of variables for machine learning models"

MetaQuotes · 2023-02-10T04:32:15.0000000Z

New article Evaluation and selection of variables for machine learning models has been published: This article focuses on specifics of choice, preconditioning and evaluation of the input variables for use in machine learning models. Multiple methods of normalization and their features will be described here. Important moments of the process greatly influencing the final result of training models will also be revealed. We will have a closer look and evaluate new and little-known methods for determining the informativity and visualization of the input data. With the "RandomUniformForests" package we will calculate and analyze the importance concept of a variable at different levels and in various combinations, the correspondence of predictors and a target, as well as the interaction between predictors, and the selection of an optimal set of predictors taking into account all aspects of importance. With the "RoughSets" package we will look at the same issue of choosing predictors from a different angle and based on other concept. We will show that it's not only a set of predictors that can be optimal, a set of examples for training can also be optimized. All calculations and experiments will be executed in the R language, to be specific - in Revolution R Open 3.2.1 . Fig. 2. Training error depending on the number of trees Author: Vladimir Perervenko

Yury Reshetov 2015.10.30 06:01 #11

Alexey Oreshkin:

...and there it is..... it's not even interesting to talk to such people.

Mutually, i.e. there is no interest to communicate with lamers, who still have the woz, because they can offer nothing but to run around all the topics, rant and impose their small-minded opinion that supposedly "nothing works". It is much more interesting to communicate with those who offer solutions to the problems of "shifting the wagon" in the right direction on the basis of personal experience, and not on the basis of unsubstantiated opinion.

Machine learning in trading: Errors, bugs, questions [100 pips a day]

Vladimir 2015.10.31 17:08 #12

The article is interesting. Thanks to the author for his hard work. It would be nice to demonstrate the described methods on a concrete example. I suggest such an example: predicting the S&P500 for two quarters ahead. I have been doing this for a long time and discuss my results on another thread. I have my own methodology of selecting inputs and normalising them. It is all described there. I will give you and everyone else a file with all quarterly economic data since 1960. I can also give the S&P500 quarterly averages for the same period.

Task:

1. Select input data. You can choose from economic data as well as all known indicators of the S&P500 price series itself.

2. Normalise the data.

3. Create and train a model to predict quarterly S&P500 values over the interval from 1960 to 1999 inclusive. The beginning of the training history can be chosen to be any.

4. Show the behaviour of the model outside the training history, on the interval 2000-today.

5. Show the prediction error two quarters ahead on the training interval and outside. The error on normalised data is calculated as follows:

Err = SQRT { SUM(Prediction[i] - Real Value[i])^2 / SUM(Real Value[i])^2 }

Calculating the prediction error in this way makes deep sense. The generally accepted methodology for calculating model error is based on RMS:

RMS_model_error = SUM(Prediction[i] - Real Value[i])^2

Banal predictions are based on the assumption that the unnormalised value of the predicted variable will be equal to its last known value. When normalised with respect to the last known value, a trivial prediction is simply 0. Thus, the RMS of trivial normalised predictions is:

RMS_ trivial = SUM(0 - Real Value[i])^2 = SUM(Real Value[i])^2

The proposed formula for calculating the prediction error Err is simply the square root of the ratio SCO_model/SCO_basic. If Err > 1, then the constructed model is worse than the trivial predictions.

If my suggestion interests you, I will post here the tables of economic indicators and the S&P500. The details of the model, data normalisation, and data selection do not interest me. I am interested in the results of predictions on the untrained section 2000 - today (graph of real values and predicted values and prediction error calculated by my formula Err).

Market prediction based on Machine learning in trading: Discussion of article "How

СанСаныч Фоменко 2015.10.31 18:51 #13

Vladimir:

The article is interesting. Thanks to the author for his hard work. It would be nice to demonstrate the described methods on a concrete example. I suggest such an example: predicting the S&P500 for two quarters ahead. I have been doing this for a long time and discuss my results on another thread. I have my own methodology of selecting inputs and normalising them. It is all described there. I will give you and everyone else a file with all quarterly economic data since 1960. I can also give the S&P500 quarterly averages for the same period.

Task:

1. Select input data. You can choose from economic data as well as all known indicators of the S&P500 price series itself.

2. Normalise the data.

3. Create and train a model to predict quarterly S&P500 values over the interval from 1960 to 1999 inclusive. The beginning of the training history can be chosen to be any.

4. Show the behaviour of the model outside the training history, on the interval 2000-today.

5. Show the prediction error two quarters ahead on the training interval and outside. The error on normalised data is calculated as follows:

Err = SQRT { SUM(Prediction[i] - Real Value[i])^2 / SUM(Real Value[i])^2 }

Calculating the prediction error in this way makes deep sense. The generally accepted methodology for calculating model error is based on RMS:

RMS_model_error = SUM(Prediction[i] - Real Value[i])^2

Banal predictions are based on the assumption that the unnormalised value of the predicted variable will be equal to its last known value. When normalised with respect to the last known value, a trivial prediction is simply 0. Thus, the RMS of trivial normalised predictions is:

RMS_ trivial = SUM(0 - Real Value[i])^2 = SUM(Real Value[i])^2

The proposed formula for calculating the prediction error Err is simply the square root of the ratio SCO_model/SCO_basic. If Err > 1, then the constructed model is worse than the trivial predictions.

If my suggestion interests you, I will post here the tables of economic indicators and the S&P500. The details of the model, data normalisation, and data selection do not interest me. I am interested in the results of predictions on the untrained section 2000 - today (graph of real values and predicted values and prediction error calculated by my formula Err).

Everything you describe is a regression prediction, i.e. a certain value is predicted with a confidence interval specified

I do not understand the practical value of such predictions in trading. And here's why.

The terminal supports buy/sell orders. This is a purely nominal variable that takes qualitative values.

You may recall that there are also limit orders. But they are also based on buy/sell orders.

If we predict the value instead of buy/sell, it turns out that the prediction error can cover the last value of the predicted variable and it is not possible to determine the type of buy/sell order.

PS. Regression models of machine learning are widely used in economics, in almost all firms, for example, in forecasting sales volume. On currency markets when hedging currency risks (components for currencies, and sales for rubles). But in trading?

Strategic foresight systems Machine learning in trading: Do the laws of

Vladimir 2015.10.31 19:47 #14

СанСаныч Фоменко:

All you have described is a regression prediction, i.e. some value is predicted with a confidence interval given

I do not understand the practical value of such predictions in trading. And here's why.

The terminal supports buy/sell orders. This is a purely nominal variable that takes qualitative values.

You may recall that there are also limit orders. But they are also based on buy/sell orders.

If we predict the value instead of buy/sell, it turns out that the prediction error can cover the last value of the predicted variable and it is not possible to determine the type of buy/sell order.

PS. Regression models of machine learning are widely used in economics, in almost all firms, for example, in forecasting sales volume. On currency markets when hedging currency risks (components for currencies, and sales for rubles). But in trading?

If the output signals are to buy or sell, then how do we evaluate the importance or suitability of inputs according to this article? How do we quantify the success of the model? On the basis of profit? On the basis of drawdown? PF? I have seen such models here many times, I will not point fingers, the authors will recognise themselves. The choice of trading indicators as a target function to evaluate inputs and model is fraught with the fact that instead of choosing the right inputs and model, the creators start chemistry with different ways of measuring success and end up with EAs-overseers/slivators. There are a lot of creative opportunities for self-deception.

Machine learning in trading: Market prediction based on Market theory

СанСаныч Фоменко 2015.10.31 20:59 #15

Vladimir:
If the output signals buy or sell, how do we then evaluate the importance or suitability of the inputs according to this article? How do we quantify the success of the model? On the basis of profit? On the basis of drawdown? PF? I have seen such models here many times, I will not point fingers, the authors will recognise themselves. Choosing trading indicators as a target function to evaluate inputs and model is fraught with the fact that instead of choosing the right inputs and model, the creators start chemistry with different ways of measuring success and end up with EAs-overseers/slivators. There's a lot of creative opportunities for self-deception.

Regression has its estimates, and classification has its estimates.

The most obvious way to evaluate the performance of classification models is the percentage of fact and prediction classes matching (percentage of buy/sell correctly predicted). The paper uses more informative methods to evaluate the performance of classification models. Not only are the tools used, but the tools are specified.

PS.

ROC is the most common.

Machine learning in trading: Bayesian regression - Has God Indicator

СанСаныч Фоменко 2015.10.31 21:00 #16

Yury Reshetov:

Where did you see regression? The article deals with binary classification:

I answered Vladimir

Vladimir 2015.11.01 04:58 #17

СанСаныч Фоменко:

Regression has its own estimates, and classification has its own estimates.

The most obvious way to evaluate the performance of classification models is the percentage of matching of fact and prediction classes (percentage of correctly predicted buy/sell). The paper uses more informative methods to evaluate the performance of classification models. Not only are the tools used, but the tools are specified.

PS.

ROC is the most common.

By classification you mean classification of bars into BUY, SELL, HOLD, right? Such classification is wrong in principle, as it is inconsistent. For example, you can classify a bar as BUY even if the price went down after it and then argue that the signal was correct because you should have sat out the drawdown until you made a profit. The same bar can just as easily be classified as SELL because the price went down. The same bar can be classified as HOLD if the price after that bar fluctuates in the corridor less than the expected profit. So we get ambiguity. With such classification we need to add additional conditions, for example, how much drawdown we will allow, how long we will wait until we make profit, what is the profit target, what we do at session closing (do we wait for Monday?).

It is much easier to classify bars by the expected direction of price movement on this bar: up or down. In my example of S&P500 prediction described above, instead of predicting the quantitative price movement for two quarters ahead, we can limit ourselves to predicting the direction of price movement. This will be unambiguous and the error can be calculated as % of correctness of guessing the direction of movement.

My suggestion above is still valid, but it seems to me that the writers of articles here will continue to describe methods and guidelines for using some tools instead of demonstrating these tools on concrete examples. All this is theory, and money is earned by writing articles and books, not by using these tools in trading. The argument about practical usefulness of articles is not new here.

Bayesian regression - Has Machine learning in trading: Hodrick-Prescott filter

СанСаныч Фоменко 2015.11.01 07:56 #18

Vladimir:

By classification you mean classification of bars into BUY, SELL, HOLD, right? Such classification is wrong in principle, as it is inconsistent. For example, you can classify a bar as BUY even if the price went down after it and then argue that the signal was correct because you should have sat out the drawdown until you made a profit. The same bar can just as easily be classified as SELL because the price went down. The same bar can be classified as HOLD if the price after that bar fluctuates in the corridor less than the expected profit. So we get ambiguity. With such classification we need to add additional conditions, for example, how much drawdown we will allow, how long we will wait until we make profit, what is the profit target, what we do at session closing (do we wait for Monday?).

It is much easier to classify bars by the expected direction of price movement on this bar: up or down. In my example of S&P500 prediction described above, instead of predicting the quantitative price movement for two quarters ahead, we can limit ourselves to predicting the direction of price movement. This will be unambiguous and the error can be calculated as % of correctness of guessing the direction of movement.

My suggestion above is still valid, but it seems to me that the writers of articles here will continue to describe methods and guidelines for using some tools instead of demonstrating these tools on concrete examples. All this is theory, and money is earned by writing articles and books, not by using these tools in trading. The argument about practical usefulness of articles is not new here.

1. If you were taught to read books and articles as a child, you would have understood that what I and the author of the article wrote and what you wrote are about the same thing.

2. If you had been instilled with the habit of respecting other people as a child, you would not have allowed yourself to write or write "peeps".

Good luck to you in learning to read.

Where can I buy Machine learning in trading: Questions from Beginners MQL5

Vladimir Perervenko 2015.11.01 09:43 #19

Vladimir:

By classification you mean classification of bars into BUY, SELL, HOLD, right? Such classification is wrong in principle, as it is inconsistent. For example, you can classify a bar as BUY even if the price went down after it and then argue that the signal was correct because you should have sat out the drawdown until you made a profit. The same bar can just as easily be classified as SELL because the price went down. The same bar can be classified as HOLD if the price after that bar fluctuates in the corridor less than the expected profit. So we get ambiguity. With such classification we need to add additional conditions, for example, how much drawdown we will allow, how long we will wait until we make profit, what is the profit target, what we do at session closing (do we wait for Monday?).

It is much easier to classify bars by the expected direction of price movement on this bar: up or down. In my example of S&P500 prediction described above, instead of predicting the quantitative price movement for two quarters ahead, we can limit ourselves to predicting the direction of price movement. This will be unambiguous and the error can be calculated as % of correctness of guessing the direction of movement.

My suggestion above is still valid, but it seems to me that the writers of articles here will continue to describe methods and guidelines for using some tools instead of demonstrating these tools on concrete examples. All this is theory, and money is earned by writing articles and books, not by using these tools in trading. The argument about practical usefulness of articles is not new here.

First, the definition of Classification is given at the kindergarten level. Then it tells about the fact that uncertainty is generated(!?) And ends as always: "Where is the key to the flat where the money is?".

You need more theoretical training. Study, study and study again ... You know.

And be more modest.

PS. Put your proposal in Freelance. Get a real product.

Trend and levels Do you want to Market: pricing as a

Yury Reshetov 2015.11.01 13:53 #20

СанСаныч Фоменко:
I was answering Vladimir.

Pardon me

Discussion of article "Evaluation and selection of variables for machine learning models" - page 2