Discussion of article "Deep Neural Networks (Part II). Working out and selecting predictors"

 

New article Deep Neural Networks (Part II). Working out and selecting predictors has been published:

The second article of the series about deep neural networks will consider the transformation and choice of predictors during the process of preparing data for training a model.

Now, we want to see the distribution of NA in variables after the outliers have been removed.

require(VIM)
evalq(a <- aggr(x.sin.out), env)

SinMissAggr

Fig.6. Distribution of NA in the data set

Author: Vladimir Perervenko

 

Interesting thing:

Обрезка минимального значения — это простой в использовании алгоритм, при котором на каждом шаге отключаются веса с наименьшим абсолютным значением. Этот алгоритм требует ретрансляции сети практически на каждом шаге и дает субоптимальные результаты.

Am I understanding the order of operation of this function correctly?

1) Fully train the original 12-8-5-1 network

2) find a link with minimum weight and remove the input

3) Re-train the 11-8-5-1 network without the remote input again

And so on for several dozens of retraining cycles. Until there is no 6-2-1-1 network left.

It seems to me that the time spent on such an elimination of insignificant weights, inputs and internal neurons will be much longer than a single full training (which we did in step 1).

What are the advantages of this approach?

 
elibrarius:

Interesting thing:

Am I understanding the order of operation of this function correctly?

1) Fully train the original 12-8-5-1 network

2) find a link with minimum weight and remove the input

3) Re-train the 11-8-5-1 network again without remote input

And so on for several dozens of retraining cycles. Until there is no 6-2-1-1-1 network left.

It seems to me that the time spent on such an elimination of insignificant weights, inputs and internal neurons will be much longer than a single full training (which we did in step 1).

What are the advantages of this approach?

1- The algorithm works exactly like this. With one exception: neurons in all hidden layers are discarded.

2. a minimal set of inputs and a minimal structure is defined that gives the same result as the full set.

Advantages? We remove all unnecessary stuff that generates false classification. That's what the developers claim.

Just one way of selecting important predictors

Good luck

 
Vladimir Perervenko:

1. The algorithm works exactly like this. With one exception: neurons in all hidden layers are eliminated.

2. a minimal set of inputs and a minimal structure is defined, which give the same result as the full set.

Advantages? We remove all unnecessary stuff that generates false classification. That's what the developers claim.

Just one way of selecting important predictors

Good luck

1) If there are no connections from inputs to intrinsic neurons, then the inputs themselves can be switched off.

2) I am confused by the many times more time spent than if you just train the full model according to point 1 If the result is the same, why spend so much time?

I can assume that the eliminated predictors will be ignored in the future when retraining/retraining and the time saving will be just then. But the importance of predictors can also change over time.

I was interested in this trick as I started to do it too, but gave up after realising how much time it takes.

Perhaps dropout loops allow for more error and fewer epochs of retraining than final training.


I wonder what logic is used to screen out hidden neurons? Each neuron has many input connections. By the minimum sum of input weights? Or the minimum sum of output weights? Or the total sum?

 
elibrarius:

1) If there are no connections from inputs to internal neurons, then the inputs themselves can be switched off.

2) I am confused by the many times more time-consuming than if you just train the full model according to point 1 If the result is the same, why waste so much time?

I can assume that the eliminated predictors will be ignored in the future when retraining/retraining and the time saving will be just then. But the importance of predictors can also change over time.

I was interested in this trick as I started to do it too, but gave up after realising how much time it takes.

Perhaps dropout loops allow for more error and fewer epochs of retraining than final training.


I wonder what logic is used to screen out hidden neurons? Each neuron has many input connections. By the minimum sum of input weights? Or the minimum sum of output weights? Or the total sum?

Look at the package and the function description. I haven't looked into it in depth. But in several models (like H2O) this is how they determine the importance of predictors. I just checked and I didn't find it reliable.

Of course, the importance of predictors changes over time. But if you have read my articles, you should have noticed that I strongly recommend to retrain the model regularly when the quality decreases below a pre-defined limit.

This is the only correct way. IMHO

Good luck

Good luck

 
Vladimir Perervenko:

Check out the package and the function description. I haven't studied it in depth. But in several models (e.g. H2O) this is how they determine the importance of predictors. I just checked and I didn't find it reliable.

Of course, the importance of predictors changes over time. But if you have read my articles, you must have noticed that I strongly recommend to retrain the model regularly when the quality decreases below a pre-defined limit.

This is the only correct way. IMHO

Good luck

Good luck

Thank you!
 

Wouldn't it be better to enter the hour and day data into the NS not with one predictor, but with separate predictors for the number of hours and days?

If one, then the weight/value of Monday (1) and Tuesday (2) will differ by 100%, and Thursday (4) and Friday (5) by 20%. With hours 1,2 and 22,23 the difference is even stronger. And going from 5 to 1 or 23 to 1 would be a huge jump in weight altogether.

That is, there will be distortions in the significance of days and hours if they are represented by a single predictor.

5 and 24 extra predictors is a lot. But since the sequence of days and hours is cyclical, they can be translated into an angle on a circle and do the same thing as regular angles: "It makes more sense to give the sine and cosine of this angle as inputs." I.e. there will be 2 predictors each for hours and days. The idea is taken from here http://megaobuchalka.ru/9/5905.html
 
elibrarius:

Wouldn't it be better to enter the hour and day data into the NS not with one predictor, but with separate predictors for the number of hours and days?

If one, then the weight/value of Monday (1) and Tuesday (2) will differ by 100%, and Thursday (4) and Friday (5) by 20%. With hours 1,2 and 22,23 the difference is even stronger. And going from 5 to 1 or 23 to 1 would be a huge jump in weight altogether.

That is, there will be distortions in the significance of days and hours if they are represented by a single predictor.

5 and 24 extra predictors is a lot. But since the sequence of days and hours is cyclical, they can be translated into an angle on a circle and do the same thing as regular angles: "It makes more sense to give the sine and cosine of this angle as inputs." I.e. there will be 2 predictors each for hours and days. The idea is taken from here http://megaobuchalka.ru/9/5905.html

Hour of day and day(week, month, year) are nominal variables, not numeric. We can only talk about whether they are ordered or not. So thanks for the suggestion, but not accepted.

Use these variables as numeric variables ? You can experiment, but I'm not looking in that direction. If you have any results, please share.

Good luck

 
I read the article, in the first part there are a lot of transformations of predictors, it is certainly informative, but I would like to see two models with and without transformation to evaluate the effectiveness of all these transformations. Also, what is the point of striving for a normal distribution?
 

Discussion and questions about the code can be done in the thread

Good luck

 
the R package funModelling has not the "
bayesian_plot()
function?

The R package funModeling has not the " function?