Discussion of article "Self-organizing feature maps (Kohonen maps) - revisiting the subject" - page 2

 
Nikolay Demko:

Save the trained grid and post the grid and training data. I think when analysing the answer will be found how it is possible. Or alternatively, it will find what the bug is.

In general, we need a reproducible example.

I attach it.

There is a resource file, somnet and a screen where I took records for the resource file. Maybe it will help ;)
Files:
Desktop.zip  756 kb
 
Anyway, thanks for the development.

I have some ideas about searching for groups of items that are generally similar. I.e. we are talking about clustering. Found a method on the net: k-mean. I read the description and looked at examples. What do you use to cluster data into groups?
 
Viktor Vasilyuk:
There are some shortcomings in the implementation of displaying the results..... But even in this form it is a working variant.

I decided to test the statistics and this is what I got:

I was a bit surprised by the situation in the second square of the first row on the left. Values #2 and #3. How is such a hard/sharp transition in colour representation possible? Take, for example, the first square of the first row on the left - between values #14 and #18 there is a smooth transition of colours.

And then, everywhere there is a consistent transition of colours from left to right or from right to left, as it is drawn in the colour palette under the picture. And here with jumping through the colours.

I think that the reason is that you have very little data for training, this is the first and probably the main thing.

The number of nodes is 4 times less than the resolution is the second.

And, it so happened that on a large spread of values (2nd column) nodes from opposite ends of the scale were next to each other, this is the third.

In the aggregate, there was such an arrangement at which a clear boundary was drawn.

But I could not reproduce the boundary in the form of a clear hexagon. In your saved network there is a border, but it is not hexagonal.

 
Viktor Vasilyuk:
Anyway, thanks for the development.

I have some ideas about searching for groups of items that are generally similar. I.e. we are talking about clustering. Found a method on the net: k-mean. I read the description and looked at examples. What do you use to cluster data into groups?

In different ways, it depends on the task. There are many ways of clustering. Kohonen is a universal clustering tool, and everything universal cannot be perfect for a particular task.

For example, if you need to cluster univariate data and do it in the fastest and easiest way, K-means is fine, but I prefer clustering through modes rather than averages.

 
Nikolay Demko:

I think the reason is that you have very little data for training, that's the first and probably the main thing.

The number of nodes is 4 times less than the resolution is the second.

And, it so happened that on a large range of values (2nd column) nodes from opposite ends of the scale were next to each other, this is the third.

In the aggregate, there was such an arrangement at which a clear boundary was drawn.

But I could not reproduce the boundary in the form of a clear hexagon. In your saved network there is a border, but it is not hexagonal.

Yeah. I gave you the wrong graph. Here's the original one from somnet that I gave in the archive.

Screenshots of the MetaTrader trading platform

GBPUSD, H1, 2017.02.25

Alpari International Limited, MetaTrader 5, Demo

GBPUSD, H1, 2017.02.25, Alpari International Limited, MetaTrader 5, Demo


1) the issue is not the amount of data itself, but the fact that there is very little data "correlating" with #2, it is quite possible that this factor could have a strong influence on the colour.

2) where did the number 4 come from? was the size of the picture divided by the number of nodes? I just can't understand the relationship. I made 70x70 on purpose to make the picture clearer.

3) 849950-142695=707255 can such a difference affect smaller differences in other columns?

4) I would like to know if it is possible to display numbers inside the picture instead of just drawing them on the side? Some numbers are not visible. Yes, pictures are saved to files, but captions in the form of numbers on the picture do not want. Is this not implemented?
 
Shit. I don't know. It's already delusional or paranoid.

Screenshots of the MetaTrader trading platform

GBPUSD, H1, 2017.02.25

Alpari International Limited, MetaTrader 5, Demo

GBPUSD, H1, 2017.02.25, Alpari International Limited, MetaTrader 5, Demo


I did the following:

1) reduced the number of samples to 10;

2) manually made changes for the second column for the values in rows 2,3 and 4

What is this nonsense?

I found the following:

1) the maximum value for the second column is either incorrectly counted or incorrectly displayed. I.e. if you sort all values downwards, the programme shows that the maximum value is the value in row #3, but not as in row #2. I observe such a trick only in this column;

2) I reduced a little "difference" between the maximum value of the second column and the minimum one. I allowed the three maximum values from this column to differ from each other by 1-1.8%. This is not much, is it? I.e., if you "by eye", they are almost identical among all other values from this column.



I am attaching my files again.
Files:
SOM.zip  90 kb
 
Viktor Vasilyuk:
Shit. I don't know. This is already delusional or paranoid.


I did the following:

1) reduced the number of samples to 10;

2) manually made changes for the second column for the values in rows 2,3 and 4

What is this nonsense?

I found the following:

1) the maximum value for the second column is either incorrectly counted or incorrectly displayed. I.e. if you sort all values downwards, the programme shows that the maximum value is the value in row #3, but not as in row #2. I observe such a trick only in this column;

2) I reduced a little "difference" between the maximum value of the second column and the minimum one. I allowed the three maximum values from this column to differ from each other by 1-1.8%. This is not much, is it? I.e. if you "by eye" estimate, among all other values from this column they are almost identical.



I attach my files again.

Note that in all the maps of the other columns there is some kind of cluster in this place.

I mean that the result is regularly repeated because that is the structure of the data.

It's just that in the second column, this cluster with minimum values is surrounded or adjacent to the maximum values. That's why the boundary is so sharp.

But SOM puts the data in a separate cluster in the neighbourhood of the maxima because the maps are interconnected and this is the best location for this cluster.

If you try to move them to different corners on the second map, you will have to move nodes from other maps to these positions.

In maps 1,4,6,8-12 these two clusters are very close in values. That is, in 8 of the 12 maps SOM has placed them next to each other. Naturally, the remaining 4 cards can be differentiated as God sent them.

Or I don't understand your problem.

 
Nikolay Demko:

Note that in all the maps of the other columns there is some kind of cluster at this location.

What I mean is that the result repeats regularly because that is the structure of the data.

It's just that in the second column, this cluster with the minimum values is surrounded or neighbouring the maximum values. That's why the boundary is so sharp.

But SOM puts the data in a separate cluster in the neighbourhood of the maxima because the maps are interconnected and this is the best location for this cluster.

If you try to move them to different corners on the second map, you will have to move nodes from other maps to these positions.

In maps 1,4,6,8-12 these two clusters are very close in values. That is, in 8 of the 12 maps SOM has placed them next to each other. Naturally, the remaining 4 cards can be differentiated as God sent them.

Or maybe I'm missing the point of your problem.

Yes. One problem. In the data file the maximum value in the second column is 559000. The picture shows (where the horizontal bar, the gradient) that this maximum value is 552000. 559000 cannot be less than 552000.
 
Viktor Vasilyuk:
Yeah. One problem. In the data file, the maximum value in the second column is 559000. The picture shows (where the horizontal bar, the gradient) that this maximum value is 552000. 559000 cannot be less than 552000.

552000

559000

Is this node data or pattern data?

The nodes do not have to be one-to-one with the training patterns.

 
Nikolay Demko:

552000

559000

Is this node data or pattern data?

Nodes don't have to be one-to-one with training patterns.

Highlighted.