Machine learning in trading: theory, models, practice and algo-trading - page 3584

 
Aleksey Vyazmikin #:

Verified with RF. But for nothing, they don't let you participate in the contest.

I got the link from a guy from Russia, he's also doing it on Cagle... How do you guys solve these problems?
 
mytarmailS #:
I got the link from a guy in Russia, he's doing it on Cagle. How do you guys solve these problems?

Well, apparently you have to specify another country when registering to participate.

 
Aleksey Vyazmikin #:

Well, apparently you have to specify a different country when registering to participate.

So he doesn't participate for the sake of participating, he lives off it.
 
mytarmailS #:
So he doesn't participate for the sake of participating, he lives from it.

If you have a chance, please ask him how to participate....

Maybe the country is a formality. As I understand it, the organiser of a particular contest puts a restriction, i.e. another contest may be allowed.

 
Aleksey Vyazmikin #:

If you have the opportunity, please ask him how to participate.....

Maybe the country is a formality. As I understand it, the organiser of a particular contest puts a restriction, i.e. another one may be allowed to participate.

They have an email for any questions, why not write???????

 
mytarmailS #:

They have an email for any questions, why not write???????

Understood.

 
Aleksey Nikolayev #:

There are some doubts about solving trees (for classification). In theory they should be good at finding rectangular regions, but imho they don't do it well for an arbitrary rectangle that is not adjacent to the edges of the original rectangle. Lots of rubbish leaves appear, for example. This is especially noticeable for rather weak patterns, which are the norm for our problems.

There is an idea to switch from trees to simple rectangle search, when splits are made not one by one, but several at a time. A quick search showed that something similar already exists - RBC (Rectangular Boundary Classifier). Then it is probably possible to do bagging and maybe even bousting on it.

Another idea that naturally arises is to switch from the classification problem to the optimisation problem. For example, when one simply searches for a rectangle that maximises profit by the sum of deals that fall into it.

What makes you think that you MUST be good at finding rectangular regions, Moreover, are problems with rectangular regions the source of classification errors?

The key question here is: what is the source of classification errors?

To me, the answer is obvious: the same predictor values in some cases belong to one class, and in another case to a different class. And no classification algorithm will separate this.

But at the preprocessing stage, you can try to correct this situation, for example, using Tomeka's algorithm.

 
СанСаныч Фоменко #:

The key question here is: what is the source of classification errors?

For me, the answer is obvious: the same predictor values in some cases belong to one class, and in another case to another class. And no classification algorithm can separate this.

But at the preprocessing stage you can try to correct this situation, for example, using Tomeka's algorithm.

The Tomek Links algorithm (or simply Tomek) is a method used in machine learning to improve data quality by removing redundant or erroneous samples in a classification task. This method is particularly useful for dealing with unbalanced data.

The basic idea of Tomek Links is as follows:

  1. Definition of Tomek Links pairs:

    • Pairs of neighbouring objects from different classes are considered. For example, objects A A A A and B B B are neighbours if the distance between them is less than the distance to any other object.
    • If A A A A and B B B belong to different classes and are nearest neighbours to each other, then such a pair is called a pair of Tomek Links.
  2. Object Removal:

    • If a Tomek Links pair is detected, then one or both objects of the pair are considered as candidates for deletion.
    • It is common to delete the object belonging to the minority (or the class that is more represented in the data) or both objects to improve the quality of the separation boundary between classes.

Example:

  • Suppose we have two classes: a positive class and a negative class.
  • For each object from the positive class, we find its nearest neighbour from the negative class and vice versa.
  • If two objects are nearest neighbours to each other and belong to different classes, they form a pair of Tomek Links.
  • These objects are either removed or only one of them is removed (depending on the implementation of the algorithm).

The purpose of using the Tomek Links method is to reduce the influence of noisy data and to improve the separation between classes, which can lead to better classification performance. This method is often used in combination with other methods for handling unbalanced data, such as resampling (oversampling and undersampling).

 
Experimented with Under/Over sampling methods a long time ago. They still work too roughly in time series classification tasks with a lot of noise, so to speak. In other words, they don't yield anything :) Just add some more noise or remove quite a bit of noise.
 
Aleksey Vyazmikin #:

The iPhones were stolen at renditions, and the ones that weren't, were sold already.

Verified from the Russian Federation. But for nothing, they don't let me participate in the contest.

It is always unprofitable to buy fresh products - high price markup, it is better to wait a couple of years, when prices will be better, technologies have been tested and improved, and software will be adapted.

Modern laptops on x86 already have good autonomy for browser tasks, if you really need it. I don't need it.

Well in the case of Apple tech, it usually only gets more expensive :)

When they released their arm processor, a laptop on it cost the same as the same laptop on intel. But people stopped buying Intel's rusty hardware after that when they felt the difference.