Machine learning in trading: theory, models, practice and algo-trading - page 3414

 

a couple of years more and Alexey will understand that all his quantisation is a clumsy attempt to do ordinary clustering )

 
Forester #:

k-means in Alglib is available in Include\Math\Alglib\dataanalysis.mqh
. But it is better to feed data to it in normalised form (in one scale). Otherwise, for example, changes in 1000 units (e.g. volumes) will completely drown out changes in 0.01000 units (e.g. prices).

Yes, that's why I find the idea interesting - it should be relatively easy to port, in theory. Does Alglib support saving and applying models?

Regarding normalisation, in general normalisation is needed, but leaves are binary - the problem goes away by itself.

 
Aleksey Vyazmikin #:

Yes, that's why I find the idea interesting - it should be relatively easy to port, in theory. Does Alglib support saving and applying models?

Regarding normalisation, in general normalisation is needed, but leaves are binary - the problem goes away by itself.

//| k-means++ clusterisation|
//| INPUT PARAMETERS:|
//|XY - dataset, array [0..NPoints-1,0..NVars-1].|
//| NPoints - dataset size, NPoints>=K|
//| NVars - number of variables, NVars>=1|
//| K - desired number of clusters, K>=1|
//| Restarts - number of restarts, Restarts>=1 |
//| OUTPUT PARAMETERS:|
//| Info - return code:|
//| * -3, if task is degenerate (number of|
//|distinct points is less than K) |
//|* -1, if incorrect|
//|NPoints/NFeatures/K/Restarts has been passed|
//| * 1, if subroutine finished successfully |
//| C - array[0..NVars-1,0..K-1].matrix whose columns|
//| store cluster's centres|
//| XYC - array[NPoints], which contains cluster |
//| indexes|

Get the array C with cluster centres, then use your new point to find which of the centres it is closer to.

There is something about clustering below, maybe there is a ready-made prediction function there. Figure it out.
 
Forester #:
//| k-means++ clusterisation|
//| INPUT PARAMETERS:|
//|XY - dataset, array [0..NPoints-1,0..NVars-1].|
//| NPoints - dataset size, NPoints>=K|
//| NVars - number of variables, NVars>=1|
//| K - desired number of clusters, K>=1|
//| Restarts - number of restarts, Restarts>=1 |
//| OUTPUT PARAMETERS:|
//| Info - return code:|
//| * -3, if task is degenerate (number of|
//|distinct points is less than K) |
//|* -1, if incorrect|
//|NPoints/NFeatures/K/Restarts has been passed|
//| * 1, if subroutine finished successfully |
//| C - array[0..NVars-1,0..K-1].matrix whose columns|
//| store cluster's centres|
//| XYC - array[NPoints], which contains cluster |
//| indexes|

Get the array C with cluster centres, then use your new point to find which of the centres it is closer to.

There is something about clustering below, maybe there is a ready-made prediction function there. Figure it out.

I understand that you need an array for each cluster, which contains the values of weights (elements) of the centroid - without them you can't calculate on the new given.

 
Aleksey Vyazmikin #:

I understand that we need an array for each cluster, which contains the values of centroid weights (elements) - without them we can't calculate on the new given.

There are no weights there, there (in C) are coordinates of centres.
 
Forester #:
There are no weights, there (in C) are the coordinates of the centres.

As I understand it, you need mu. It is different for each predictor, hence the vector/array.

 
Aleksey Vyazmikin #:

As I understand it, you need mu. It is different for each predictor, hence the vector/array.

I think this formula is involved in training / finding cluster centres. For prediction you just need to find the nearest centre by C[]
 
What attributes would you include in the model if you wanted to predict whether the TS will make money on new data for the next n points?
 
Aleksey Vyazmikin #:

As I understand it, you need mu. It is different for each predictor, hence the vector/array.

The mu is the centre of the segment, the cluster in this case, as I understand it.

If it were a circle, the formula would work.

 
Renat Akhtyamov #:

mu is the middle of a segment, a cluster in this case, I take it.

If it were a circle, the formula would work.

The wind of life is sometimes fierce
On the whole, however, life is good
And it is not terrible when the bread is black,
It is terrible when the soul is black.

Reason: