Machine learning in trading: theory, models, practice and algo-trading - page 79

 
Dr.Trader:

I used the "repeatedcv" method in trainControl, with the default split. Recently I wrote code for crossvalidation myself, tried crossvalidation both with just randomly taken bars, and with consecutively taken chunks without gaps. I didn't see any difference on the fronttest, it gave about the same result in both cases. I divided the training/crossvalidation data 50%/50%, maybe at that ratio it doesn't matter anymore.
I'll experiment with that in caret...

I remember in that article you posted a while ago, the leading comparison was boosted trees with Platt's method (something like that). All I found on google about this method is that you have to pass the model output to sigmoid and take its result. Can gbm or xgboost do that? This approach is kinda better than forest, neuronics and some "bagged trees", which are in second place.

Gbm and xgboost are boosted trees. For better convergence in grpdient each new tree is built on weighted observations based on training results of previous tree. Both linear models and nonlinear models can be boosted...

In second place there is a random forest. This is bagging as I understand it. The averaging of the results of several models built on different data.

Read about Gradient boosting. It's hard to find a better one for classification. I do classification on results of regression predictor for example.
 
About CV. The default is random partitioning... For time series, time separability is important... caret can do this. Caret CV time series custom folds... look it up. In my code I posted earlier. It's implemented there in traincontrol.
 
Alexey Burnakov:
About CV. The default is random partitioning... For time series the time separability is important... you can do it in caret. Caret CV time series custom folds... look it up. In my code I posted earlier. It's implemented in traincontrol.

I'm looking at you and am amazed... You want to get something out of nothing. That is, from zero, you want to get 0.00000000000000000000000000000000000000000001 result. I can't look at it anymore, so I'll give you a hand. Actually, the topology of the network is secondary. In machine learning, in forex and beyond, the data comes first. That is, the most important thing in the design of neural networks, is not the topology of the network, or the method of training. It is input data and output data. If the data is relevant to the market, then any even a small perseptron will solve your classification problem and will work fine in the future, for one simple reason, because the data at the input are relevant to the market and this data is able to predict it. What you are trying to do is to scratch the information out of your fingers. Sorry, but the result will also be irrelevant.... As for the market, in forex, the primary thing is volume, and then the market's reaction to it. And not the volume that in MT (tick), and the actual volume of futures, the same euro. The delta cluster is a great help. There is a lot of useful information. So, using the volume of trades, will significantly increase the performance of any network, even the simplest perseptron. There's also a delta there, which is also extremely useful. And you're trying to build a model on the basis of indicators that are secondary, I would even say tertiary after the price, and expect a miracle from it. A miracle will not happen, I assure you.....

P.S., just think about who you're trying to go up against... Corporations with teams of the coolest programmers, with more processing power than your quad-core. With money invested in developing new methods, etc. And here Alexey from the simple Russian countryside decided to hack the market in 5 years and get the grail. Come down from heaven and take off the rose-colored glasses........

 
Mihail Marchukajtes:

In machine learning, as applied to forex, and not only forex, the data comes first. That is, the most important thing when designing neural networks is not the network topology or the training method. It is input data and output data. If the data is relevant to the market, then any even a small perseptron will solve your classification problem and will work fine in the future, for one simple reason, because the input data is relevant to the market and this data is able to predict it.

I don't even have anything to argue with, everything is correct. We know that too, and we discuss not only classification models, but also methods of selecting predictors (input data), read this thread first.

I assume you're hoping to manually select a dozen inputs, build a model, trade for a week, start losing, start picking inputs again. I used to do that too, sometimes I had funny strategies like "take a certain grain to initialize the neuron, train it exactly 7777 iterations, everything will be ok, but every other Tuesday you have to trade against its signal. Optimize the network on new data every other day. Such strategies are real, but it takes a long time to pick something like that, and it only brings profit for a couple of weeks. All because such a strategy is based on some short-term pattern.

Instead, I now choose an algorithm for automatic selection of entries. There are about 100 entries for each bar and a hundred bars and this algorithm selects such a combination of entries that together they stably give the right buy/sell signal during the whole year. It is not like the optimization of Expert Advisors in mt5, where you can achieve great results and lose a profit on the fronttest, but it is more complicated, with crossvalidation and different criteria of estimating the result. I used to select about a hundred entries, now it's a bit less, just a couple dozen. I receive accuracy on fronttest 60-70%, but it is unstable yet, it is necessary to get rid of degrees of freedom in all process of selection and training to receive approximately the same results even starting each time from zero.

Mihail Marchukajtes:

P.S., Just think about who you are trying to go up against... Corporations with teams of the coolest programmers, with more processing power than your quad-core. With money invested in developing new methods, etc. And here Alexey from the simple Russian countryside decided to hack the market in 5 years and get the grail. Come down from the sky and take off the rose-colored glasses........

Corporations with their own facilities and programmers use the same data analysis and modeling software available to us. Let's say they train the perfect model and get 100% profit per month. I have less power, with the same data I can build a weaker model, with, say, only 50% of profits. It's enough for me.

 
Mihail Marchukajtes:

I look at you and am amazed... You want to get something out of nothing. That is, from zero, you want to get 0.0000000000000000000000000000000000000000000000000000 result. I can't look at it anymore, so I'll give you a hand. Actually, the topology of the network is secondary. In machine learning, in forex and beyond, the data comes first. That is, the most important thing in the design of neural networks, is not the topology of the network, or the method of training. It is input data and output data. If the data is relevant to the market, then any even a small perseptron will solve your classification problem and will work fine in the future, for one simple reason, because the data at the input are relevant to the market and this data is able to predict it. What you are trying to do is to scratch the information out of your fingers. Sorry, but the result will also be irrelevant.... As for the market, in forex, the primary thing is volume, and then the market's reaction to it. And not the volume that in MT (tick), and the actual volume of futures, the same euro. Cluster delta is a great help. There is a lot of useful information. So, using the volume of trades, will significantly increase the performance of any network, even the simplest perseptron. There's also a delta there, which is also extremely useful. And you're trying to build a model on the basis of indicators that are secondary, I would even say tertiary after the price, and expect a miracle from it. A miracle will not happen, I assure you.....

P.S., just think about who you're trying to go up against... Corporations with teams of the coolest programmers, with more processing power than your quad-core. With money invested in developing new methods, etc. And here Alexey from the simple Russian countryside decided to hack the market in 5 years and get the grail. Come down from the sky and take off the rose-colored glasses........

Demagogue, ouch. It's time to get out of here. Build a house.

"The dog barks, the caravan goes." С

 
Dr. Trader:

I don't even have anything to argue with, that's right. We know that too, and we discuss not only classification models, but also methods of selecting predictors (inputs), read this thread first.

I assume you're hoping to manually select a dozen inputs, build a model, trade for a week, start losing, start picking inputs again. I used to do that too, sometimes I had funny strategies like "take a certain grain to initialize the neuron, train it exactly 7777 iterations, everything will be ok, but every other Tuesday you have to trade against its signal. Optimize the network on new data every other day. Such strategies are real, but it takes a long time to pick something like that, and it only brings profit for a couple of weeks. All because such a strategy is based on some short-term pattern.

Instead, I now choose an algorithm for automatic selection of entries. There are about 100 entries for each bar and a hundred bars and this algorithm selects such a combination of entries that together they stably give the right buy/sell signal during the whole year. It is not like the optimization of Expert Advisors in mt5, where you can achieve great results and lose a profit on the fronttest, but it is more complicated, with crossvalidation and different criteria of estimating the result. I used to select about a hundred entries, now it's a bit less, just a couple dozen. I receive accuracy on fronttest 60-70%, but it is unstable yet, it is necessary to get rid of degrees of freedom in all process of selection and training to receive approximately the same results even starting each time from zero.

Corporations with their facilities and programmers use the same data analysis and modeling programs that are available to us. Let's say they train the perfect model, and get 100% profit per month. I have less power, with the same data I can build a weaker model, with, say, only 50% of the profit. I'll be fine.

Let me put it this way. The top funds show an average annual return of 40-50%. The smart and the excellent can work there as well. I don't see anything unusual for me to get close to the 50% mark a year and have that increase.
 
Alexey Burnakov:
I'll tell you this. Top funds show an average annual return of 40-50%. They may work there both clever and excellent people. I don't see anything unusual that I'm getting closer to 50% a year and will have this growth.

First - the funds show such a miserable yield for one reason only, the lack of liquidity in the market, it is hard to put a large amount of money in the strategy, you do not have such problems

Secondly - And why not focus on 100% per month, say?

I completely agree withMihail Marchukajtes, in order to qualitatively increase the level of recognition you need to improve the quality of the signs, and models.... their influence +/- 5% on the overall result

 
mytarmailS:

First - the funds show such a miserable yield for one reason only, the lack of liquidity in the market, it is hard to put a large amount of money in the strategy, you do not have such problems

Secondly - And why not focus on 100% per month, say?

I completely agree withMihail Marchukajtes, in order to qualitatively increase the level of recognition you need to improve the quality of the signs, and models.... their influence +/- 5% on the overall result

You are also a demagogue. Well, show us the inputs with this degree of information. Why do we take the top models? To squeeze signals out of noisy data, if we had data without noise, we could do the formula in excel.

"100% per month." Strive, show results, share ideas. Let's hear how to increase profitability by 20 times and not to plummet in the next month from the drawdown.

 
Alexey Burnakov:

You are also a demagogue. Well, show us the inputs with this degree of information. Why do we take the top models? To squeeze signals out of noisy data, if we had data without noise, we could do the formula in Excel.

"100% per month." Strive, show results, share ideas. We will listen to you, how to increase your profitability 20 times and not to plummet in the next month from the drawdown.

"Long-livers" forex. More than 5 years of trading. Sorting by FS. Yes, some have cosmic returns, but other stats are bad. This is reality. And Stabiliti trades hands. All others show FS 3 and less.

 
Alexey Burnakov:

You are also a demagogue. Well, show us the inputs with this degree of information. Why do we take the top models? To squeeze signals out of noisy data, if we had data without noise, we could do the formula in Excel.

"100% per month." Strive, show results, share ideas. We will listen to you, how to increase your yield 20 times and not to get drained in the next month from the drawdown.

Yes, we are all demagogues here, only you are d'artagnan, it is already clear, good that at least you are not a troll.... bye..... :)


I think that the main thing is that the market is stable and the market is stable... I think that the main thing is that the market is stable and the market is stable and the market is stable... If you do not know what to expect from the market, you will not get an answer from me.

This man is Ph. D. in Technical Sciences, has long ago (about 20 years ago) defended his dissertation on "AI". He's been building robots for over 20 years and has a lot of experience.

He states that the market cannot be predicted from a black box position, it is necessary to identify the working attributes, to understand how and why they work and to filter the data to leave only what works, ignoring the noise.

He has about 100 signs (predictors) in his network, each trait has a whole library or package as you want.

And now compare what gulf in quality there is between a sign that requires a whole library and some silly little thing called "SMA". SMA", "MACD" RSI and other ... I don't think there is 0.00000001% useful information in it, just likeMihail Marchukajtes wrote, and this is a fact, otherwise models would show exactly the efficiency they can show, which means 90% of correct answers

This man recommends reading "MSUA" and spectral analysis in particular Fourier

========================================================

Further what results I "demagog" have achieved, but in fact very modest, good ideas as I think a lot, my research is going in many directions simultaneously and there is a huge lack of knowledge in various fields, so often ask for help from forum participants, but especially no one wants to help, they say you learn yourself, and then..... only if I myself mastered all then what I this communication, as without logic to me ... well that I digressed


Here's the best at the moment. that I managed to squeeze out of RF on the new data is 50% per month for two months in a row, but everything is still very unstable, I tried to fill up the pictures 10 times, but do not get (got it)

ww

www

The bottom line is that there is no need to limit yourself to patterns like 30% a year is cool, it's not cool, it's a framework for the mind and creativity

Reason: