Machine learning in trading: theory, models, practice and algo-trading - page 2648

 

I wonder how the dimensionality reduction algorithm sees samples with different data types with and without normalisation

For example, there are data types, string and digits.

q1           q2
1    c -1.630015623
2    c  1.781979246
3    b -0.598134088
4    a -0.611477494
5    b -0.347432530
6    b -0.474427356
7    e -1.048827859
.....

I first convert q1 into digits.

q1           q2
1    3 -1.630015623
2    3  1.781979246
3    2 -0.598134088
4    1 -0.611477494
5    2 -0.347432530
6    2 -0.474427356
7    5 -1.048827859

.... 

done

Now we send it to the UMAP algorithm and get the eigenvectors.

                  [,1]         [,2]
    [1,]   6.762433406   9.08787260
    [2,] -21.488330368  10.67183802
    [3,]   6.810413818   9.35273386
    [4,] -20.950310976  15.20258097
    [5,]  32.100723691  -9.74704393
    [6,]   6.892939805  16.84639975
    [7,] -17.096480607  -6.63144430

visualise the points

Nice worms we got ))

Let's try to colour the points with the variable q1.


As we can see the variable q1 creates the structure of these worms, it kind of pulls the importance on itself, reducing the contribution of the variable q2.

This is because the variable q1 has large values and the data is not normalised.

If we normalise the data, each variable will make the same contribution and we will get

I understand that for some participants this is kind of obvious stuff, we need to normalise blah blah blah blah,

but have you ever thought that by increasing or decreasing the contribution of variables you can control clustering?

 
mytarmailS #:

visualise the points


It looks like parasites :)

 
mytarmailS #:

But have you ever thought that by increasing or decreasing the contribution of variables, clustering can be controlled?

Yes, deliberately over- or underestimating the significance.
But it's an art, it's hard to analyse.
The situation is aggravated by non-stationarity of prices, I have been fighting with signs for a long time: if you change the scale or normalisation, the properties of the trained model change.
 
Maxim Dmitrievsky #:
with non-stationary prices, I've been at war with signs for a long time.
We're all fighting it.
 
mytarmailS dimensionality reduction algorithm sees samples with different data types with and without normalisation

For example, there are data, string and digit types.

first I convert q1 into digits

It is better to convert strings into categorical form rather than numeric. Of course, if your UMAP can process them.

a=1 is not 5 times different from e=5. They are just different, like warm and soft. And by digitising them, you've made them warmer and warmer.

 
elibrarius #:

a=1 is not five times different from e=5. They're just different,

Hmm, yeah, you're absolutely right, I was stupid.

You have to do a one hot conversion or something.
 
Aleksey Nikolayev #:

I think that the question of what to do with the identified boxes is complex and hardly has clear, unambiguous rules for all possible cases. A good, well-thought-out algorithm is probably quite a secret "know-how").

If the cases are obtained on the same set of predictors, their non-intersection is probably enough. If there is an intersection, it can be allocated to a separate box, and its complements can be split into several boxes. However, too large a number of boxes will fragment the sample too much. Therefore, we can generalise the notion of a box - in the language of rules, this means adding negations and OR to AND.

If the boxes are obtained on completely different predictors (for example, by the randomforest method), then they can overlap only in the sense of the parts of the sample that fall into them. Some near-portfolio ideas are probably needed here.

If the sets of predictors overlap partially, then there must be some mixture of approaches, it is hard to say for sure.

It is not clear to me how this can be put into a unified scheme. The standard way of constructing decision trees simply and "nicely" circumvents these problems, which makes it not quite suitable for our purposes. It may be possible to improve it by selecting a pruning algorithm, but in my opinion it is better to creatively rework the rule construction algorithm.

Well, without understanding the details, it is difficult to make changes to the logic.

I, personally, have not understood what the additional 2 coordinates of the box are (2 - quantum boundaries) - I assumed that it is a sample trimming.

Just looking for something useful to develop my method. I have gluing of "boxes" as well - but the algorithm is not perfect.

 
Aleksey Vyazmikin #:

There you go, without understanding the details it's hard to make changes to the logic.

I, personally, did not understand what the additional 2 coordinates of the box are (2 - quantum boundaries) - I assumed that it was a sample trimming.

Just looking for something useful to develop my method. I have gluing of "boxes" as well - but the algorithm is not perfect.

If you are talking specifically about PRIM, then my link just gave an example of how it works for two predictors x1 and x2. Accordingly, a box of the form (a1<x1<b1)&(a2<x2<b2) is selected. What is left outside the box is apparently considered to belong to a different class than what is inside. There was an attempt to show by a simple example the essence of thealgorithm - cutting off a small piece (peeling) from the box at each step. Which piece is cut off and by which predictor is chosen from the condition of optimality of the "trajectory" step.

I was interested in this algorithm as an example of how a standard algorithm for building rules (for a solving tree) can and should be modified to suit one's needs.

 
Aleksey Nikolayev #:

If you are talking specifically about PRIM, my link just gave an example of how it works for two predictors x1 and x2. Accordingly, a box of the form (a1<x1<b1)&(a2<x2<b2) is selected. What is left outside the box is apparently considered to belong to a different class than what is inside. There was an attempt to show by a simple example the essence of thealgorithm - cutting off a small piece (peeling) from the box at each step. Which piece is cut off and by which predictor is chosen from the condition of optimality of the "trajectory" step.

I was interested in this algorithm as an example of how a standard algorithm for building rules (for a solving tree) can and should be modified to suit one's needs.

It's good that you've figured it out - I didn't understand it at first, thanks for the clarification.

But, then it turns out that the algorithm at the first stage should find pairs of predictors that will better separate into boxes, and then apply "peeling" to them.

 
Aleksey Vyazmikin #:

Good that you figured it out - I didn't realise straight away, thanks for the clarification.

But, then it turns out that the algorithm at the first stage should find pairs of predictors that will better separate into boxes, and then apply "peeling" to them.

No, it works for any number of predictors. At each step it is chosen which predictor and which slice (left or right) is optimal to cut off. Conventional decision trees do the same thing -- at each step, both the predictor and its cut point are chosen to be optimal to produce two new boxes. The only difference with PRIM is that a boundedly small slice is cut at each step, resulting in a gradual process, hence the word patient in the name.

Personally, I find another modification of the standard approach interesting, where each box is cut into not two but three new boxes. I'll give some thoughts on this sometime.

Reason: