# Machine learning in trading: theory, practice, trading and more - page 189

You are missing trading opportunities:

- Free trading apps
- Free Forex VPS for 24 hours
- Over 8,000 signals for copying
- Economic news for exploring financial markets

Registration
Log in

You agree to website policy and terms of use

If you do not have an account, please register

Vizard_:Just forgot who told you in 9 y.y. that your network is not properly configured, and with a polyanalyst you can pull the formula))))

There are no pretensions and secrets, I am using standard dm tools, sometimes with minor edits. If you're only interested in the referrals, listen,

But the reality is a bit different... The previous experiment was on real data. I did it with simple artificial data. The first set was recognized absolutely

correctly. I added ......... The answer should be 100%, but jPrediction 11 was so shitty that it gave out too much "ii"))) In short, fine-tune it,

The device doesn't work yet. I won't throw out the data, you're showing off, you'll figure it out... Version 20, maybe I'll look again, if "advertising" will be like today)))

SanSanych Fomenko:Well, maybe I just had a different idea about the meaning of the word "interaction."

There are clear rules for treating interactions in linear models. They are a bit more complicated than the linear combination treatment:https://www.r-bloggers.com/interpreting-interaction-coefficient-in-r-part1-lm/

But you have to dig through a lot of combinations to find meaningful interactions. Now that's an ambush.

Mihail Marchukajtes:Are you aware that multiple optimizations always get different results. It's optimization, if it always came to the same answer it would be good, but it's too strange, try to optimize several times, I'm sure that 8 out of 10 times you get your 100%. So it's like this....

He doesn't even realize that the total sample before training is divided into parts at random: some patterns go to the training part, others to the test part. And with this breakdown, it may well turn out that some patterns, necessary to clarify the patterns, are crowded in the test part and are not presented in the training part. And since the algorithm is trained only on the training part and has no telepathic powers to know what is in the test part, there will be errors when calculating the generalization ability. That is, nothing surprising happens.

But when the patterns that are supposed to clarify patterns turn out to be evenly distributed over different parts of the sample, then the learning ability is higher than in the case described above.

I.e. it is not a case of one thing for another and any randomness sooner or later can show its undesirable side.

It is quite possible that it is possible to find a method by which the general sample will be divided into parts not randomly, but deterministically? But for now, experience shows that all determinism in dividing a sample is fraught with fitting and subsequent overtraining.

Yury Reshetov:It is quite possible that one could find a method by which the general sample would be divided into parts not randomly, but deterministically? But so far, as experience shows, all determinism in dividing a sample is fraught with fitting and subsequent retraining.

Andrey Dik:Maybe you need to do training several times and divide up the sample randomly each time? And already from this ready set of trained models to choose and in general you can assess how good the model is.

This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.

For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If we compute patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later we will run into irregularity.

Yury Reshetov:This is already implemented in jPrediction, i.e. several different sample partitions are computed in parallel on different CPU cores (two binary classifiers are one terrary per free core). The processor turns out to be 100% loaded. The problem is that the number of cores in the CPU is limited, so the probability of irregular pattern distribution can only be reduced, but it is very problematic to minimize them. Unless you train patterns on supercomputers rather than on cpu's.

For example, if you calculate models on the Chinese supercomputer Tianhe-2, there are 3,120,000 cores. The probability of uneven distribution of patterns over parts of the sample will be negligible. If you calculate patterns on a 4-core staff (and also reserve a couple of cores for other tasks), it is not surprising that sooner or later you will run into irregularity.

That is, it is useful to do this. So instead of 4 partitions, obviously not enough, you should do 40 partitions. It would take 10 times longer to count for 4 cores, but I suppose time can be sacrificed in favor of robustness.

"If it can be done and if it will do any good, it should be done." (c) Papo Carlo Albertovich.

Andrey Dik:That is, it is useful to do this. So instead of 4 partitions, which is obviously not enough, you should do 40 partitions. For 4 cores will take 10 times longer to compute, but I suppose time can be sacrificed in favor of robustness.

Not obviously.

For example, waiting 10 hours for calculations instead of 1 hour is unacceptable for daytrading. Even if we leave the computer overnight, we will obtain a model based on inevitably outdated data.

So there needs to be a sensible compromise between computing time and model quality. And the best way is to parallelize everything that can be calculated in parallel, while everything that cannot be parallelized should be calculated sequentially.

As a last resort, you can upgrade your computer to a larger number of cores, or build a computing cluster of several personal computers.

I am not even talking about the fact that machine learning algorithm code often has potential opportunities for further optimization.

It is also possible that some part of multitasking can be transferred from CPU to GPU.

I.e. there are a lot of potential solutions to the problem (the list can go on and on) and "making a hump" in software is not the best of them, and as experience shows, it is often the most inadequate.

Yury Reshetov:It's not obvious.

For example, it is unacceptable to wait 10 hours for calculations instead of 1 hour for the day-trading. Even if you leave the computer overnight, you will get a model on obviously outdated data.

So a sensible compromise between computation time and simulation quality is necessary. And the best option: all that can be computed in parallel should be paralleled, what cannot - compute sequentially.

As a last resort, you can upgrade your computer to a larger number of cores, or build a computing cluster of several personal computers.

I am not even talking about the fact that machine learning algorithm code often has the potential to further optimize.

It is also possible that some of the multitasking can be moved from CPU to GPU.

I.e. there are a lot of potential solutions (the list can go on and on) and "making a mess" in software is not the best of them and, as experience shows, is often the most inadequate.

I do not insist on the "hump option", just asking: the more variants the data will be divided into, the better training can be obtained by doing an analysis of the results. Let's say that in 90% of cases we see that the model shows adequate results on the test data, and only in 10% we get overtraining, which means that the model itself is worth something. And if it's the other way around, it's worth recycling. And if we divide the data into only 4 different variants, the probability that we will get an overtrained model is extremely high.

Again, I'm not touching on the "hardware" aspects, just clarifying the "software" ones.

Vizard_:...

Not having advantages over the known..... but nobody will believe)))

Give a concrete example of known ... that "have advantages".

And from you nothing but unsubstantiated criticism, which always ends with the fact that the sample does not give, do not show the software (all strictly classified, witnesses removed). But you draw some unrealistic figures that no one but you can neither confirm nor deny.

Banal question for you: if, according to quotations, you "have a possibility to get" 92% with kopecks of generalizing ability, then why are you still engaged in empty criticism of some not "having advantages with known ...", but not engaged in buying: factories, newspapers, steamships, islands, yachts, etc.? And when will I be able to admire your face on the cover of Forbes?