Machine learning in trading: theory, models, practice and algo-trading - page 3631
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
With the forward 5 years (clean) few normal finds on hours, like this one
There are few deals, because filters remove all weak patterns, leaving the strongest ones.
No, classic macroeconomic models in the vein of BEER (mentioned in my thread about FA) - FA influence on large (I have monthly) timeframes.
I am not familiar with such a model.
There shouldn't be much of a trend there, as the data are mostly streaming (per period), not cumulative. In terms of data normalisation, I relied on those built into R models - they are usually default there.
1. trend is always important, whether it's five times in a row the index has increased or it's the first time - it's important for the market, I think.
2. I was rather saying that it is not quite correct to compare data from 10 years ago without processing, for example the same GDP or other indicator in currency, because you have to take into account inflation. Or the same unemployment claims - you have to take into account the total number of working population.
First of all, of course, the number of signs (more than 400 for EURUSD initially) should be less than the number of months (a little more than 200).
400 is a lot - I don't even know what there are so many.... I'll assume that there are a lot of indicators for the same EU countries and a split by US territories (by Fed member banks - a guess). If the data is released at the same time, then you can cut them by idea as they are a composite of the whole. Or generalise everything and highlight a separate predictor that there is a strong divergence in the dynamics.
I am interested in easily reproducible methods using standard packages. The main thing, of course, is that crossvalidation should show sufficient significance of the remaining features - this is usually the problem.
You can go the other way round - divide the whole training interval into parts, say 10, and build a model on each part and already estimate the significance of predictors in this model. Do a ranking and identify the top ten. Or how many... I also described such a method earlier here, it seemed to work quite well at the time.
I also did the following: I grouped predictors by TF, construction principle, base indicator and so on, and searched these groups. If a group was significant, I searched for significant predictors in it by a complete search. Adding, removing... similar to genetics, but done in semi-automatic mode. It's efficient, but time-consuming. Now it could be done in python.
For example, you can always do a linear regression and then select several features that are significant for it. But if you repeat it many times, in the process of crossvalidation, then each time different attributes turn out to be significant and in the end you may get 0 significant attributes.
In general, I think it is a normal situation that different signs will be significant at different moments of time. As if the state of the economy is different at the moment of time, different expectations....
That's why I wrote that we need a fundamental understanding of the country's situation, what the Central Bank wants, what it can do and what it does. Does it get what it wants...
Imho, to see at least minimal significance in crossvalidation is to understand it first of all. The problem is the lack of constant clear market reaction to macroeconomic background and/or its changes. Not that this is big news - for example, the starting post of my FA thread has links on this topic.
I don't think it's possible to get the point across through a handful of numbers. Perhaps the model is also missing political data, general news about wars and such...
Not familiar with this model.
Familiarise yourself
I don't even know what there is so much...
Familiarise yourself
You can do it the other way round - divide the whole training interval into parts, let's say 10, and build a model for each part and already estimate the significance of predictors in this model. Do a ranking and identify the top ten. Or how many... I also described such a method earlier here, it seemed to work quite well at the time.
I also did the following: I grouped predictors by TF, construction principle, base indicator and so on and searched these groups. If a group was significant, I searched for significant predictors in it by a complete search. Adding, removing... similar to genetics, but it was semi-automated. Effective, but time-consuming. You could do it in python now.
Well, with bicycles and bicycle ideas I have a complete order.
More interested in classics (at least MO, at least FA).
I think in general it's a normal situation that different features will be significant at different points in time. As if the state of the economy is different at the moment of time, different expectations....
That's why I wrote that we need a fundamental understanding of the country's situation, what the Central Bank wants, what it can do and what it does. Whether the desired...
No, this is a typical multicollinearity problem. The dependence of the final set of significant features is not so much on time, but on the initial set of features from which the selection is made. For the vast majority of FA traits, VIF is off the charts.
About "understanding" of "Laws of Market and Nature" has already appeared somewhere on the forum).
I don't think it is possible to understand the essence through a handful of figures. Perhaps the model lacks political data, general news about wars, etc...
You can say that about anything. A model has no practical meaning when it becomes a model of the whole world. It is meaningful only with a foreseeable limited set of predictors.
Familiarise yourself
Recognise
Well, there's nothing wrong with bicycles and cycling ideas.
I'm more interested in the classics (either MO or FA).
No, it's a typical multicollinearity problem. The dependence is not so much on time, but on the initial set of traits from which selection is made. For the vast majority of FA traits, VIF is off the charts.
About "understanding" of "Laws of Market and Nature" has already appeared somewhere on the forum)
You can say that about anything. A model has no practical sense when it becomes a model of the whole world. It makes sense only with a foreseeable limited set of predictors.
I sat there for 10 minutes thinking "...why, why am I trying to help someone again?".
What am I supposed to say? Apparently, to offer for a fee to go deeper into the problem, to study the model?
In fact, I wrote that the RFE method works in general, and its implementation and improvement can be anything.
Still, I'm in favour of making sense of the data, not just finding a solution. Just feeding bare statistics to a model is pointless.
There should be a skeleton model of how things work and throwing weights on it already through learning, which sort of regulate the cash flow fan...
For about 10 minutes I sat there thinking "...why, why am I trying to help someone again?".
What am I supposed to say? Apparently, to offer for a fee to go deeper into the problem, to study the model?
In fact, I wrote that the RFE method works in general, and its implementation and improvement can be anything.
Still, I'm in favour of making sense of the data, not just finding a solution. Just feeding bare statistics to the model is pointless.
There should be a skeleton model of how things work and throwing weights on it already through learning, which sort of regulate the cash flow fan...
Well, you instead of answering my question started teaching me FA without knowing its basics or local calendar. You have every right to do so - the forum is open, but other forum members have exactly the same right to engage in teaching.
Regarding the essence of my question - RFE is used within the framework of some model training. With crossvalidation it turns out to be a perfect horror in terms of computation time. As well as forward selection from Forester. I think for now to limit myself to filters in the spirit of San Sanych's ideas. For binary classification we can think of quite a lot of them.
I'll show you a hocus-pocus.)
I will say something that is not commonly talked about (Keep in mind that the CER is a non-stationary, as it is commonly believed, series. It may indeed be non-stationary, but it is impossible to draw 100% conclusions about it without having the whole history of the series both in the past and in the future).
So, we have some process (green) and some noise on it (red). We assume that the green process is our friend, we can make money on it (regularities, or otherwise, some abstract formula of the market), and the red process (also some regularities, but short-term, which are difficult to track and detect in time, and most importantly - to trade profitably):
Everything is wonderful and fine. Let's move on. There is such a concept as "overlearning". This concept is "post factum", i.e. it is impossible to know in advance that the network has been retrained before it starts working on data unknown to it. Nor do all sorts of testing, validation and other checks solve the problem. Sooner or later the network will have to work on completely new unfamiliar data. Therefore, I suggest to consider the term "retraining" as an anti-scientific, subjective and even harmful notion. Well, how else could it be, we have trained the grid, the grid works well in + on the real, great, we thought, we have taught the grid well. And the next month the grid suddenly started trending in -, oh, the grid was "retrained", right?))) Everything is clear, in short, with this term.
Let's move on. Let's assume that we trained, trained and learnt the grid on the above process. We have applied, tried on and not applied all sorts of tricky methods, classification and clustering, and other tricky ways of automatic methods, which as magic wands are designed to fight with something that does not even have a precise definition and concept - overtraining. And we got a set of weights and offsets in the common array W1. So, we got a trained grid that would predict if we were moving into the future, and would "imagine" at each step a row like this (well, she thought so, she sees so, she's an artist):

And if we initialised the initial weights of the network differently, trained and could get weights different, W2, and got the network's representation of the process in the future like this:
In both cases we would get some value of the error criterion (on which we trained the grid), say 0.01.
But in fact, attention, the natural process would move in the future like this:
And, attention again, this corresponded to the W3! weights with the same error criterion value of 0.01.
Can you smell that? There is a theorem which states that a network with one hidden layer is able to approximate any continuous function, while a network with two or more hidden layers is able to approximate any function, including those with cliffs.
That is, using the same evaluation criteria we can obtain different weight sets of the network. In all cases, at W1, W2, W3, W4, ..... Wn - almost completely 100% describe the basic process within the time available for the study. At the same time, only some of the possible resulting sets of weights will match the fully true process, and hence reproduce on unknown new data.
So, to complete the novel "Who is to blame?", we have to wonder where the real problem is - but this short post has all the answers.
That's we haven't yet touched on terms that can't be spoken aloud in this thread. Those who seek shall find, those who want to know shall know, those who want to teach shall teach. Or what is it?
It is a very convenient position to think so when nothing works out. And when it works out, there are no reliable reasons not to think so too. But, I have shown by the example above that the market, such a very complex process, can very easily deceive about its nature anyone who tries to learn it and calculate its formula. Don't believe me? I can easily give you a formula for a process that is very difficult to anticipate in the next discretisation step. And it will not be some very complicated formula of a non-stationary process with random components, but rather simple and computable. But in some cases neurons will show just the fact of learning a known history by this formula, and in other cases it will turn out that the real formula of this process has been selected.
So, the point of proper learning is to find the correct formula for the process, not its shadow, then it won't matter if the process is stationary or not. So what was wrong with the W1, W2, W3 example? - Or the opposite is true, but where does the golden key to neural network learning lie?
Well, instead of answering my question, you started teaching me FA without knowing its basics or the local calendar. You have every right to do so - the forum is open, but other forum members have exactly the same right to teach.
I didn't see that I started to write about FA at all. I wrote about duplication of information in the news - now I looked - yes, nothing has changed in a decade.
Well, and, funny enough, but I just merged from that thread for the reason that I have a profile education, and I'm not very interested in arguing on these topics. I don't consider myself a guru, but I realise that it's not that simple.
Regarding the essence of my question - RFE is used as part of some model training. With crossvalidation it turns out to be a perfect mess in terms of computation time. As well as forward selection from Forester. I think for now to limit myself to filters in the spirit of San Sanych's ideas. For binary classification we can think of quite a lot of them.
Right, trained CatBoost. For me it is not a problem to wait a couple of weeks for the result.
You can post your sample, I will try to see through quantisation if there is anything useful there within my training paradigm.
So, the point of correct learning is to find the correct formula of the process, not its shadow, then it will not matter if the process is stationary or not. So what was wrong with the W1, W2, W3 example? - Or the opposite is true, but where does the golden key to neural network learning lie?
There are no formulas that can reliably describe this process. I think wrong expectations can lead to wasted effort. You can imagine that a price chart is a sound wave from a rubbish recycling machine, and you can learn to identify what the machine is shredding now, but what's next in line is not so easy without data outside of those sound waves.
Here on the forum, the term "overtraining" is often misused - and implies just a bad result. In reality, though, the term implies a strong fit to the training data, licking all points with great precision. For trees, depth splits up to 100% of the class representative in the leaf. And in fact, the problem is often a change in the probability distribution, i.e. it is the observed data from which the predictor was calculated. I showed everything in detail in this thread at the beginning of the year.
But, tell me about the key, yellow metal is always in price.