Machine learning in trading: theory, models, practice and algo-trading - page 640

 
Mihail Marchukajtes:


In all seriousness, please don't make a fuss, Mikhail. This is an important moment. If this idea, no matter for what reason (whether because of my lack of skill or complete stupidity before the new opportunities) does not work, then the next one will come to the community of traders very, very soon. Absolutely sure of it.

 
Mihail Marchukajtes:

Hold me seven!!!! And mark this day on your calendar with a red pencil, because that's the day I downloaded R and will be spinning it little by little...

On Sensei, a giveaway from the guys))) h2o.automl.

The rattle is average, but it's all on automl...

 
Vizard_:
Old Fa stood in an orchard. An orange tree spread out before him. Birds were flying everywhere, chaotically.
The birds did not land on the orange tree and he thought the fruit was poisonous. He would take a step to the side and see
the apple tree behind the orange tree, where on the fruit and a couple or even more of the birds flying chaotically around
would surely sit down... But he continued to stand in one place... Hungry, exhausted, depressed...
 By the end of the sixth year, it seemed to Hu Jou that he had penetrated into the very essence of the art of hunting. For it wasn't the prey, but the concept itself, that had become central to him... (с)
	          
 

http://playground.tensorflow.org

visualization of ns learning, seems to be just for fun or as a teaching example

something she has obvious problems with spiral classification :)


Tensorflow — Neural Network Playground
Tensorflow — Neural Network Playground
  • Daniel Smilkov and Shan Carter
  • playground.tensorflow.org
It’s a technique for building a computer program that learns from data. It is based very loosely on how we think the human brain works. First, a collection of software “neurons” are created and connected together, allowing them to send messages to each other. Next, the network is asked to solve a problem, which it attempts to do over and over...
 

And such an architecture can already

This is just like Poincaré - if the feature space is incoherent, you need at least two layers, there was already a question about this from elibrarius


 
Maxim Dmitrievsky:

And this architecture can already


Maxim, what about feature selection? Aye-aye.
Also, make the learning speed slower when the network starts to vibrate.

Last summer I played with this thing. Very visual thing).
 
Aleksey Terentev:
Maxim, what about the feature selection? Aye-aye.
Also, make the learning curve slower when the network starts to vibrate.

Last summer I played with this thing. Very illustrative thing).

Well, yes, if you put the sines, it can be with one layer

 

EMVC doesn't do what I wanted, it doesn't do what it looks like from a cursory reading of the description.

EMVC takes a table with predictors and targets (classes only, no regression), and calculates the probability if each training example really belongs to a specified class. You can thus find the rows in the training table that contradict most of the other training examples (outliers, errors), and remove them so as not to confuse the model during training.

I assumed that it was possible to find a set of predictors that would give the highest probability estimates, but the found sets of predictors were unsatisfactory. I won't experiment with this further, there are better tools for selecting predictors. I can't see the cross-entropy estimation, although the package uses it in some way, it doesn't return that answer to the user.

But we got an interesting tool for sifting out training examples rather than predictors.


library(EMVC)
data(iris)


trainTable <- iris #таблица  на которой в дальнейшем будет обучаться какая-то модель
PREDICTOR_COLUMNS_SEQ <- 1:4 #номера  колонок с предикторами
TARGET_COLUMN_ID <- 5 #номер  колонки с таргетом

EMVC_MIN_TRUST <- 0.9 #минимально  допустимая вероятность принадлежности к классу посчитанная через emcv. От 0 до 1.

emvcData <- t(as.matrix(trainTable[,PREDICTOR_COLUMNS_SEQ]))
emvcAnnotations <- as.numeric(trainTable[,TARGET_COLUMN_ID])
emvcAnnotationsUnique <- unique(emvcAnnotations)
emvcAnnotationsMatrix <- matrix(0, ncol=ncol(emvcData), nrow = length(emvcAnnotationsUnique))
for(i in 1:length(emvcAnnotationsUnique)){
  emvcAnnotationsMatrix[i, emvcAnnotations == emvcAnnotationsUnique[i]] <- 1
}

set.seed(0)
emvcResult <- EMVC(data = emvcData,
                   annotations = emvcAnnotationsMatrix,
                   #  bootstrap.iter = 20,
                   k.range = 2
                   #  clust.method = "kmeans",
                   #  kmeans.nstart = 1,
                   #  kmeans.iter.max = 10,
                   #  hclust.method = "average",
                   #  hclust.cor.method = "spearman"
)

badSamples <- c()
for(i in 1:ncol(emvcResult)){
  if(max(emvcResult[,i])<EMVC_MIN_TRUST){
    badSamples <- c(badSamples, i)
  }
}
cat("Indexes of bad train samples:", badSamples,"\n") #Это  номера строк в обучающей табличке которые повышают кросс-энтропию данных. Они противоречат большинству других обучающих примеров, и возможно следует их удалить из обучающей таблички
trainTable <- trainTable[-badSamples,]

 
Dr. Trader:

EMVC doesn't do what I wanted, it doesn't do what it looks like from a cursory reading of the description.

EMVC takes a table with predictors and targets (classes only, no regression), and calculates the probability if each training example really belongs to a specified class. You can thus find the rows in the training table that contradict most of the other training examples (outliers, errors), and remove them so as not to confuse the model during training.

I assumed that it was possible to find a set of predictors that would give the highest probability estimates, but the found sets of predictors were unsatisfactory. I won't experiment with this further, there are better tools for selecting predictors. I can't see the cross-entropy estimation, although the package uses it in some way, it doesn't return that answer to the user.

But at least we got an interesting tool for screening out training examples rather than predictors.


Too bad.

Once again you've proved the idea that miracles don't happen, you have to pick everything up from scratch.

 
Dr. Trader:
You can thus find lines in the training table which contradict the majority of other training examples (spikes, errors), and remove them in order not to confuse the model during training.

Does it really need to be done on forex data where regularities are difficult to find? It seems to me that half of the examples can be eliminated by such a program. And outliers can be searched for with simpler methods: do not delete them, but, for example, equate them to the maximum allowed.

Reason: