Machine learning in trading: theory, models, practice and algo-trading - page 303

 
Yuriy Asaulenko:
This is certainly true. But the high entry threshold increases all sorts of risks. Not necessarily financial.

All so, the above-said can be expressed in a banal form: "profit is a monotonous function of reasonable risk", which applies not only to trade, but to all human activities, besides it is known that those who think that they do not risk at all, for example, choosing a non-competitive profession and relying on the state pension, they are extremists))


And what is life "without" risk? It's not interesting at all, considering that it will end the same for everyone.

 
Well, my article covers the confidence interval and so on. I strongly recommend that you read it carefully and you can take a lot from it.
 
Yuriy Asaulenko:

By a systematic approach, I still understand the understanding of what you are doing and, accordingly, the ability to plan and predict the results of your actions.

Thanks for the article. Since I am not familiar with any specific software, it is perfect for a beginner - simple and clear. The only thing I don't understand, what method is used, regression or classification?
Naturally, I immediately began to try it on my own systems. If any question is a problem, I will find it out as the game progresses.

1. I don't use candlesticks to enter and exit - only a stream of quotes, and candlesticks only on the history from the previous candlestick. In training it is ok, let it learn by candlestick, but how to make Rattle swallow the quotes flow in the current candle is still a mystery. The candlestick's flow should be somehow analyzed.

2. What to do with rebuildable predictors? For example, with regression lines and sigmas. You cannot even paste them into the history (for learning), we need functions that calculate them on the fly and remove their traces from the history.

Similarly, we have flickering predictors that do not always exist and are built from certain points of the series, and in general they can also be rebuilt in the course of the play.

4 The issue of normalizing predictors by items 2 and 3 - it is fundamentally impossible.

And the history for predictors should be calculated in the course of both training and work.

So far we have nothing but confusion.


rattle is good in two cases:

  1. When we first get to know each other.
  2. When you know the problem for the first time, you should think and try it...

1. Regression or classification. This determines the type of target variable. A valid number is regression. A nominal (factor) is classification.

2. You have to start with the target variable. Despite the seeming simplicity, this is a complicated question. What are you going to predict? Direction? magnitude, excess, level...

3. Predictors. You have to prove that they "relate" to the target variable. This is the hardest part. I spend up to 70% of my time on this question. I've written a lot about it in this thread.

4. Not satisfied with statics. This is where leaving to R comes in. rattle makes it easier to go to R because it logs all your actions on R and this finished code can be used to modify it. Generally the next step is caret.

 
SanSanych Fomenko:


2. You have to start with the target variable. Despite its apparent simplicity this is a rather complex question. What are you going to predict? Direction? Value, excess, level...

It seems that it is necessary to predict everything). Now there are about 30 "indicators" (more correctly called predictors) + their mutual processing and logic in the model (not MO). It was supposed + 10 more.

To cope manually with such a volume and still understand the contribution of each predictor to the whole, is no longer very real. Hence, by the way, the idea of using MO. Everything is still in its infancy.

SanSanych Fomenko:

3. Predictors. We have to prove that they "relate" to the target variable. This is the most difficult. This question eats up to 70% of my time. I have written a lot about it in this thread.

Yes, I will have to adapt it. I guess you can't just stick it in).

SanSanych Fomenko:

4. Not satisfied with statics. This is where leaving to R comes in. rattle makes it easier to go to R because it logs all your actions on R and this ready code can be used for modification. Generally the next step is caret.

Got it.

 

Interesting table, the most commonly used MO packages

classnamepackagedownloads
surv.coxphCox Proportional Hazard Modelsurvival153681
classif.naiveBayesNaive Bayese1071102249
classif.svmSupport Vector Machines (libsvm)e1071102249
classif.ldaLinear Discriminant AnalysisMASS55852
classif.qdaQuadratic Discriminant AnalysisMASS55852
classif.randomForestRandom ForestrandomForest52094
classif.gaussprGaussian Processeskernlab44812
classif.ksvmSupport Vector Machineskernlab44812
classif.lssvmLeast Squares Support Vector Machinekernlab44812
cluster.kkmeansKernel K-Meanskernlab44812
regr.rvmRelevance Vector Machinekernlab44812
classif.cvglmnetGLM with Lasso or Elasticnet Regularization (Cross Validated Lambda)glmnet41179
classif.glmnetGLM with Lasso or Elasticnet Regularizationglmnet41179
surv.cvglmnetGLM with Regularization (Cross Validated Lambda)glmnet41179
surv.glmnetGLM with Regularizationglmnet41179
classif.cforestRandom forest based on conditional inference treesparty36492
classif.ctreeConditional Inference Treesparty36492
regr.cforestRandom Forest Based on Conditional Inference Treesparty36492
regr.mobModel-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Nodeparty,modeltools36492
surv.cforestRandom Forest based on Conditional Inference Treesparty,survival36492
 
SanSanych Fomenko:

Interesting table, the most commonly used MO packages

classnamepackagedownloads
surv.coxphCox Proportional Hazard Modelsurvival153681
classif.naiveBayesNaive Bayese1071102249
classif.svmSupport Vector Machines (libsvm)e1071102249
classif.ldaLinear Discriminant AnalysisMASS55852
classif.qdaQuadratic Discriminant AnalysisMASS55852
classif.randomForestRandom ForestrandomForest52094
classif.gaussprGaussian Processeskernlab44812
classif.ksvmSupport Vector Machineskernlab44812
classif.lssvmLeast Squares Support Vector Machinekernlab44812
cluster.kkmeansKernel K-Meanskernlab44812
regr.rvmRelevance Vector Machinekernlab44812
classif.cvglmnetGLM with Lasso or Elasticnet Regularization (Cross Validated Lambda)glmnet41179
classif.glmnetGLM with Lasso or Elasticnet Regularizationglmnet41179
surv.cvglmnetGLM with Regularization (Cross Validated Lambda)glmnet41179
surv.glmnetGLM with Regularizationglmnet41179
classif.cforestRandom forest based on conditional inference treesparty36492
classif.ctreeConditional Inference Treesparty36492
regr.cforestRandom Forest Based on Conditional Inference Treesparty36492
regr.mobModel-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Nodeparty,modeltools36492
surv.cforestRandom Forest based on Conditional Inference Treesparty,survival36492

Forgot to include another one in the list. Ah yes, I have a unique software, one might say rare :-)
 
Yuriy Asaulenko:

It seems that it is necessary to predict everything). Now there are about 30 "indicators" (more correctly called predictors) + their mutual processing and logic in the model (not MO). It was supposed + 10 more.

To cope manually with such a volume and still understand the contribution of each predictor to the whole, is no longer very real. Hence, by the way, the idea of using MO. Everything is still in its infancy.

SanSanych Fomenko:

3. Predictors. We have to prove that they "relate" to the target variable. This is the most difficult. This question eats up to 70% of my time. I have written a lot about it in this thread.

Yes, I will have to adapt it. I guess you can't just stick it in).

SanSanych Fomenko:

4. Not satisfied with statics. This is where leaving to R comes in. rattle makes it easier to go to R because it logs all your actions on R and this ready code can be used for modification. Generally the next step is caret.

Got it.

I will bring my 5 kopeck. As a matter of fact you may feed such inputs, which will cause not an output variable, but price! Then any TS will be trained well. Examples of Target Functions:

The most obvious SIGNAL BETTER, then Will there be a pullback to a certain level, which of today's levels will be reached, etc. Read my article don't be lazy, I mention it. So for all these target functions I give the same inputs and all models work quite satisfactorily. And you can see how the same inputs are looking at the market. Here on the profit, here on the pullback, here on the level. They work well, because the entries are the reason for the price.

I will explain a little bit the reason when changes in the entry leads to changes in the price, and not vice versa. This may be very confusing, because the statistics of the TS itself is very bad. Because Zscore takes exactly the value as the price tells it and not vice versa. The delta is the reason for price change, for example. :-)

 
Mihail Marchukajtes:

I'll give you my five cents. In fact, you should feed such inputs, which will be the reason not for the output variable, but for the PRICE! Then any TS will be trained well. Examples of Target Functions:

The most obvious SIGNAL BETTER, then Will there be a pullback to a certain level, which of today's levels will be reached, etc. Read my article don't be lazy, I mention it. So, for all these target functions I give the same inputs and all models work quite satisfactorily. And you can see how the same inputs are looking at the market. Here on the profit, here on the pullback, here on the level. They work well, because the entries are the reason for the price.

I will explain a little bit the reason when changes in the entry leads to changes in the price, and not vice versa. This may be very confusing, because the statistics of the TS itself is very bad. Because Zscore takes exactly the value as the price tells it and not vice versa. The delta is the reason for price change, for example. :-)

I have read your article, if you mean the link on the previous page. Maybe I have missed something. I will reread it.

Of course, the predictors are there to predict the price movement. But their superposition + price gives a signal for entry, i.e. they predict the reaction of the initial (training) black box. The question is analogous to the chicken or the egg first? Perhaps the disagreement is purely in terminology.

From the point of view of ideology, at least in the systems with rigid logic, it is more correct to predict the price, while the output variable is already the result of processing.

 
Yuriy Asaulenko:

Your article, if you mean the link on the previous page, was read. I may have missed something. I will reread it.

Yes, of course predictors are there to predict the price movement. But their superposition + price gives a signal for entry, i.e. they predict the reaction of the initial (training) black box. The question is similar to the chicken or the egg first? Perhaps the disagreement is purely in terminology.

From the ideological point of view, at least in the systems with rigid logic, it is more correct to predict the price, while the output variable is already the result of processing.


That's right, but you have to predict the price using data from which it changes. There is a very interesting observation. If input is the reason for price, then result of working out of sample will be slightly worse than training, i.e. NS works in training and still works in "Out of sample" but worse, when significantly, when not. It all depends on the model. And when you give at the input data which do not depend on the price, then work at the "Out of sample" turns into CoinFlip, coin flip. You never know when the NS will make a mistake. Something like this....
 
Mihail Marchukajtes:

Everything is correct, but to predict the price you have to use data from which it changes. There is a very interesting observation. If input is the reason for price, then result of working out of sample will be slightly worse than in training, i.e. NS works in training and still works in "Out of sample", only worse, when significantly, when not. It all depends on the model. And when you give at the input data which depend on the price, then work in the "Out of sample" sector turns into CoinFlip, coin flip. You never know when NS will make a mistake. Something like this....

Actually, we have no data on which the price and its changes depend. And it can't be, unless we are insiders. In general, we are looking for indirect (secondary) data about the future in the price behavior itself. That is, our data depends precisely, on the price and its behavior in the past and present.

And this statement:you should predict the price with data from which it changes. You cannot agree with it. Well, it's indisputable that the higher the quality of input forecasting is, the better the results are.

------------------------------

I've started to prepare predictors for migration to MO. I wanted to do it all in R. It turned out that R, with all its power, is not at all adapted to modeling and signal processing. Unfortunately. Everything is extremely inconvenient.

I will have to transfer all the preparatory work to SciLab, where everything is much easier and more convenient. SciLab is an environment with an interface and ideology very close to R, and is designed for data processing and mathematical modelling. It has everything from radio engineering to aerodynamics and a lot of mathematics which is completely missing in R. Specificity, though. Stat methods and Data Mining in SciLab quite well represented, but in this sense, SciLab is significantly inferior to R in the choice of such methods. You can't make SanSanych's scaffold on SciLab.) Although, there are a lot of installed packages there, but nothing seems to be close.

In general, I have to combine the different IDEs to solve different problems and transfer data between environments. Too bad. I wanted to do everything in the best way (with R) but it turned out to be the same as always.

Reason: