Machine learning in trading: theory, models, practice and algo-trading - page 2623

 
JeeyCi #:

and the answer wasn't meant for you - you still can't read...

If you recommend an article which contains nonsense, you gobble it up (which shows your competence), you recommend others to gobble it up too...

Question: if that answer wasn't addressed to me, is it no longer delusional? The question is not rhetorical.
 
JeeyCi #:

You don't even need a 2nd model here, do you? - Cross Validation and Grid Search for Model Selection ...

but maybe just the confusion matrix will answer your 2nd question (the purpose of the 2nd model of your idea)...

.. . or

... I just doubt you need the 2nd model ... imho

That's exactly the improvement in confusion matrix is claimed when using the second model, if you read Prado for example. But it also uses oversampling of examples for the first model to increase the number of "true positives" or something else. I've forgotten already, unfortunately.
Using one model, you can improve one thing at the expense of another, and with 2 models you can improve everything, supposedly. Look for a Confusion Matrix Prado or Meta Leibeling Prado. I'm on my phone.
And cross-valuation will show "good" after such manipulations, you can just look at the new data for equity balance of TC and see everything at once )
 
Maxim Dmitrievsky #:
Here's just the improvement in confusion matrix is claimed with the second model, if you read the Prado, for example. But it also uses oversampling examples for the first model to increase the number of either true positives or something else. Forgot already, unfortunately.
Using one model, you can improve one thing at the expense of another, and with 2 models, you can improve everything, allegedly. Look up Confusion Matrix Prado or Meta Leibeling Prado. I'm on my phone.

up-sampling & down-sampling are for Imbalanced datasets and small training sets - if that's what you mean - i.e. giving higher weights to smaller classes and vice versa... Yes, probably to increase them (tru positives)...

***

and about 2 models - well, it's probably possible to filter 2 times - first signals for setting weights, then trades on them according to these weights (launched by input's in 2nd weighing)... though it looks like it's possible to learn on deals with context - and to keep gradient for earlier time-series - good idea... BUT the implementation when working with context is still a bit different usually - the task is to use "transaction and its context" coding and 2nd RNN takes in processing result of 1st for decoding in output -- but it has little to do with working 2 networks on 2 different tasks (e.g., context and transactions), since in fact it is processed-passed through 2 networks "transaction and context" (as a pair!!!)... - it only solves the speed issue , but not (or to a lesser extent) the validity of the output... imho...

but if you really want to separate the processing of context and transaction (context separately, transactions separately) -- so far such a construction reminds me of a sandwich (or oil and butter, lubricating interrelations and dependencies of phenomena from each other - in 2 layers)... I don't pretend to interpret your TechSuite, but I have expressed my concerns and suggestion that it may still be worth preserving in the modelling process - namely the Relationships!..! I wish you a beautiful (reflective of reality! not buttery-oil) Network Architecture!

p.s. ) as an eternal problem of "contextual advertising" - "the main thing is not to break away from reality" (only their scales setup is sometimes crooked - I will not point fingers at whom - or with small samples worked in the wrong direction)

 
JeeyCi #:

up-sampling & down-sampling are for Imbalanced datasets and small training sets - if that's what you mean - i.e. giving more weight to smaller classes... Yes, probably to increase them (tru positives)...

***

and about 2 models - well, it's probably possible to filter 2 times - first signals for setting weights, then trades on them according to those weights (launched by input's at 2nd weighing)... though it looks like it's possible to learn on deals with context - and to keep gradient for earlier time-series - good idea... BUT the implementation when working with context is still a bit different usually - the task is to use "transaction and its context" coding and 2nd RNN takes in processing result of 1st for decoding in output -- but it has little to do with working 2 networks on 2 different tasks (e.g., context and transactions), since in fact it is processed-passed through 2 networks "transaction and context" (as a pair!!!)... - it only solves the speed issue , but not (or to a lesser extent) the validity of the output... imho...

but if you really want to separate the processing of context and transaction (context separately, transactions separately) -- so far such a construction reminds me of a sandwich (or oil and butter, lubricating interrelations and dependencies of phenomena from each other - in 2 layers)... I don't pretend to interpret your TechSuite, but I have expressed my concerns and suggestion that it may still be worth preserving in the modelling process - namely the Relationships!..! I wish you a beautiful (reflective of reality! not buttery-oil) Network Architecture!

p.s. ) as an eternal problem of "contextual advertising" - "the main thing is not to break away from reality" (only their scales setup is sometimes crooked - I won't point fingers at who - or with small samples worked in the wrong direction)

The concept of context is perhaps not very useful in the case of time series. There is no clear division there, both models are involved in prediction. One is direction, the other is timing. I would say they are equivalent. The question is how we can optimize searching for the best trading situations relying on analysis of model errors, and if it's possible. I can retrain one or the other sequentially. After each retraining pair the result must improve on the new data. It means it must be able to extract a pattern from a training sample and gradually improve on new data which it hasn't seen. A non-trivial task.

I've made throwing out examples that are poorly predicted by the first model into the "don't trade" class of the second model. Threw out the hooping sample for the first model. The first one's error dropped to almost zero accordingly. The second has a small one too. But that doesn't mean it will be good on the new data.

It is some kind of combinatorial problem: find the right buy and sell at the right time.

Maybe it is impossible to find the solution here
 
Maxim Dmitrievsky #:
The concept of regularity implies repeatability, that's important!

If one cluster can predict something with 90% probability and is repeated at least 200 times, we can assume it is a pattern.
Or it's not a cluster, it's a log.

When you are dealing with a complex model (complex from the word "complex") you lose the ability to fix the repeatability of internal patterns, in other words you lose the ability to distinguish patterns from fitting...

Understanding this, you can immediately understand that neural networks go to the trash, but wooden models can be decomposed into rules, and the rules can already count statistics
 
mytarmailS #:
The concept of regularity implies repeatability, that's important!

If one cluster can predict something with 90% probability and is repeated at least 200 times, we can assume it is a pattern.
Or it's not a cluster, it's a log.

When you are dealing with a complex model (complex from the word "complex") you lose the ability to detect repeatability of internal patterns, in other words you lose the ability to distinguish patterns from fitting...

Understanding this, you can immediately realize that neural networks go to the trash, but wooden models can be decomposed into rules, and the rules can already count statistics
But a lot of features can be crammed into ns if there are no simple dependencies, albeit without the possibility of analysis. Otherwise we will throw away all machine learning and go back to simple ways of writing TS :) Then we can just write simple algorithms, watch how they (don't) work in the tester, adjust, watch again, etc.
 

statistics are linear, whatever way you look at it... neural networks are dumb (or smart - depends on the developer) weighting... using 2 or more layers of Dense ns for weighting gives Non-linear dependencies (conventionally speaking, because dependency is OR dumb correlation is still a very big question)... but as long as even a dumb correlation works - you can try to make money on it... - the moment when it stops working must be detected in time (you need to notice some kind of anomaly - random or systematic - that is another question - and then, as usual, decide on your question of risk/profitability)

the ns convenience is in its flexibility - you can get/supply quite a different "nomenclature" to the output. It's flexible - you can get/supply quite different "nomenclature" from the input - i.e. you can do the transformations we need in the network itself... and do it in multi-threaded mode (depends on the library)... not just statistics...

Whether or not you need statistics to find an input is another question...

knowledge and experience help more often than statistical processing - because the first focuses on specifics, the 2nd on reduction to a common denominator ...

Everything has its place - statistics as well...

***

the point is that for a robot - there is no other way to explain (and it won't explain you any other way) except via probabilities derived from numbers... - THAT'S HOW ECONOMICS WORKED FOR THE WORLD - with numbers 0 and 1... so we have to digitise inputs to get output probabilities and set conditions of confidence intervals (which we trust, not necessarily statistics)... and we can trust anything (it's subjective) - either the binary logic or also the weighted result of this binary logic (aka % probabilities from the whole range of potential solutions)... -- it's just a matter of taste and habits, not a subject for an argument about the search for the Grail...

(and entering a forest or entering a neural network is already a detail)

no one has forbidden the joint use of trees/forests and neural networks within the same project... - the question is Where to apply what and when (speed and memory are important), not which is better... - better not to lose time - equivalent to "timing apart from the transaction is lost time, just as a transaction apart from timing is an unknown transaction"

 
Such a long rant and such a weak conclusion :) Even if we abstract from time, one model (regularized, not fitted) cannot teach a good ratio of profitable and unprofitable trades, or the exclusion of unprofitable ones. You cannot get rid of the classification error, which is perceived as an artificial deterioration of the TS trading results even on a trailing sample.
 

No model can get more than probabilities (which is an advantage and a disadvantage of any digitalization), even if these probabilities are not weighted... I don't poison myself with sandwiches and don't advise anyone - no one has cancelled Bayes (even if you don't put it in the code, and especially - if you put it in the code)...

p.s. And you must be a McDonalds fan... - hypothesis, I won't check it...

Algorithmics is dearer than your conclusions

 
JeeyCi #:

No model can get more than probabilities (which is an advantage and a disadvantage of any digitalization), even if these probabilities are not weighted... I don't poison myself with sandwiches and don't advise anyone - no one has cancelled Bayes (even if you don't put it in the code, and especially - if you put it in the code)...

p.s. And you must be a McDonalds fan... - Hypothesis, not going to test it...

Algorithmics is dearer than your conclusions.

Sandwiches are widely used, any deep net. There are different tricks for different tasks. But if you think narrowly, any copier is a photocopier and any burger a McDonald's
You can become a hostage to your own stereotypes without ever trying anything. And stereotypes don't come from layering 😀
In my answer, implicitly, I used a second clarifying model that singled out from generalized knowledge specific, more appropriate to the situation