# Machine learning in trading: theory, models, practice and algo-trading

Good afternoon, everyone,

I know that there are machine learning and statistics enthusiasts on the forum. I propose to discuss in this topic (without holivars), share and enrich our own knowledge bank in this interesting field.

For beginners and not only there is a good theoretical resource in Russian: https: //www.machinelearning.ru/.

A small review of literature on methods for selecting informative features: https://habrahabr.ru/post/264915/.

I propose problem number one. I will post its solution later. SanSanych has already seen it, please do not tell me the answer.

Introduction: in order to build a trading algorithm, it is necessary to know what factors will be the basis for predicting the price, or trend, or the direction of opening a deal. Selecting such factors is not an easy task, and it is infinitely complex.

Attached is an archive with an artificial csv dataset that I made.

The data contains 20 variables with the prefix input_, and one rightmost variable output.

The output variable depends on some subset of input variables(the subset can contain from 1 to 20 inputs).

Task: using any methods (machine learning) select input variables, which can be used to determine the state of output variable on the existing data.

The solution can be posted here in the form: input_2, input_19, input_5 (example). And you can also describe the found dependence of inputs and output variable.

Who can do it, well done ) From me the ready solution and explanation.

Alexey

Files:

Deus Ex Machina.

These are the words that open the pages of many years' worth of philosophical treatises.

So, no one wants to do machine-lifting?

Alexey Burnakov:

Deus Ex Machina.

These are the words that open the pages of many years' worth of philosophical treatises.

So, no one wants to do the machine-lifting thing?

Every deal has a risk and other conditions, machine learning uses old data, that is, it operates on something that does not exist.

yerlan Imangeldinov:
Every transaction has risk and other conditions, machine learning uses old data, that is, it operates on something that does not exist.

More precisely, to what it was before.

And in this is looking for a stable dependence.

That's what we're looking for.

Alexey Burnakov:

More precisely, to what it was before.

And in this is looking for a stable dependence.

That's what we're looking for.

This is the weakness that the market is learning through the Soros function of the old stuff is better not to use.

yerlan Imangeldinov:
Every deal has risk and other conditions, machine learning uses old data, that is, it operates to something that does not exist.
And you have new data? So, you don't even look at the chart, you don't even look at the old data? Yes?

Dmitry Fedoseev:
Do you have new data? So, you don't even look at the chart, because there's old data? Yes?
You got it off your tongue.

Anyway, here goes. To spur the topic a bit, I promise to transfer 5 credits to whoever solves the given problem correctly.

Give out a set of informative inputs.

The community gave them to me for activity on the forum, I'll return them to the system, but it will get some interesting discussion.

Alexei

The stated topic of Machine Learning is important, complex, and huge. Judging by the first post, you want to start with one of the preparatory and important steps, "Evaluation and choice of predictors". What do you want to solve or show with the task given? A new method, method, or what?

The content and topic of the topic do not match.

Specify the goal, maybe there will be interested people.

Few people have free time to solve problems with unclear goals.

Good luck

I have no idea what to do with it:
Each deal has a risk and other conditions, machine learning uses old data, i.e. it operates on something that does not exist.

Always learning from the past.

We look at the graph for centuries. Both on and we see "three soldiers", then we see "head and shoulders". How many such figures we have already seen and believe in these figures, we trade...

And if the task is set as follows:

1. to automatically find such figures, not for all charts, but for a particular currency pair, the ones that occurred recently, not three centuries ago in the Japanese rice trading.

2. is the initial data on which we automatically search for such figures - patterns.

To answer the first question let us consider an algorithm called "random forest". 10-5-100-200 ... input variables. Then it takes the entire set of values of the variables referring to one point in time corresponding to one bar and searches for such a combination of those input variables that would correspond on the historical data to a quite certain result, for example, a BUY order. And another set of combinations for another order - SELL. A separate tree corresponds to each such set. Experience shows that the algorithm finds 200-300 trees for the input set of 18000 bars (about 3 years). This is the set of patterns, almost analogues of "heads and shoulders", and whole mouths of soldiers.

The problem with this algorithm is that such trees can pick up some specifics that are not encountered in the future. This is called "superfitting" here in the forum, "overfitting" in machine learning. It is known that the whole large set of input variables can be divided into two parts: those related to the output variable and those not related to the noise. So Burnakov tries to weed out the ones that are irrelevant to the output.

PS.

When building a trend TS (BUY, SELL) any kind of variables are related to noise!

Judging by the first post, you want to start with one of the preparatory and important steps, "Evaluation and choice of predictors. What do you want to solve or show with the given problem? A new method, method, or what?

The content and topic of the topic do not match.

Specify the goal, maybe there will be interested people.

Few people have free time to solve problems with unclear goals.

Ok.

If someone decides or at least come close to the right solution (that is, the topic will be alive), then I:

will post the correct solution - the algorithm for generating the dataset

explain why a number of other " Predictor Estimation and Selection" algorithms failed

I'll post my method, which robustly and sensitively solves similar problems - I'll give the theory and post the code in R.

This is done for mutual enrichment of "understanding" of machine learning tasks.