a package that is able to select BPs that can be predicted and those that cannot, if I understand correctly
http://www.gmge.org/2012/05/foreca-forecastable-component-analysis/
http://www.gmge.org/2015/01/may-the-forec-be-with-you-r-package-foreca-v0-2-0/
And all comers. The z1 archive contains two files train and test. For Target, build model on train, apply to test, post results in % (successfully predicted
of cases) for both samples (train = xx%, test = xx%). Methods and models do not need to be announced, just numbers. It is allowed to use any data manipulation
and mining methods.
1. All of your predictors have no predictive power - without exception, they are all noise.
2. Three models were built: rf, ada, SVM. Here are the results
rf
Call:
randomForest(formula = TFC_Target ~ ,
data = crs$dataset[crs$sample, c(crs$input, crs$target)]
ntree = 500, mtry = 3, importance = TRUE, replace = FALSE, na.action = randomForest::na.roughfix)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 49.71%
Confusion matrix:
[0, 0] (0, 1] class.error
[0, 0] 197 163 0.4527778
(0, 1] 185 155 0.5441176
ada
Call:
ada(TFC_Target ~ ., data = crs$dataset[crs$train, c(crs$input,
crs$target)], control = rpart::rpart.control(maxdepth = 30,
cp = 0.01, minsplit = 20, xval = 10), iter = 50)
Loss: exponential Method: discrete Iteration: 50
Final Confusion Matrix for Data:
Final Prediction
True value (0,1] [0,0]
(0,1] 303 37
[0,0] 29 331
Train Error: 0.094
Out-Of-Bag Error: 0.157 iteration= 50
SVM
Summary of the SVM model (built using ksvm):
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 1
Gaussian Radial Basis kernel function.
Hyperparameter : sigma = 0.12775132444179
Number of Support Vectors : 662
Objective Function Value : -584.3646
Training error : 0.358571
Probability model included.
Time taken: 0.17 secs
On the test set (I mean rattle, not yours)
Error matrix for the Ada Boost model on test.csv [validate] (counts):
Predicted
Actual (0,1] [0,0]
[0,0] 33 40
(0,1] 35 42
Error matrix for the Ada Boost model on test.csv [validate] (proportions):
Predicted
Actual (0,1] [0,0] Error
[0,0] 0.22 0.27 0.55
(0,1] 0.23 0.28 0.45
Overall error: 50%, Averaged class error: 50%
Rattle timestamp: 2016-08-08 15:48:15 user
======================================================================
Error matrix for the Random Forest model on test.csv [validate] (counts):
Predicted
Actual [0,0] (0,1]
[0,0] 44 29
(0,1] 44 33
Error matrix for the Random Forest model on test.csv [validate] (proportions):
Predicted
Actual [0,0] (0,1] Error
[0,0] 0.29 0.19 0.40
(0,1] 0.29 0.22 0.57
Overall error: 49%, Averaged class error: 48%
Rattle timestamp: 2016-08-08 15:48:15 user
======================================================================
Error matrix for the SVM model on test.csv [validate] (counts):
Predicted
Actual [0,0] (0,1]
[0,0] 41 32
(0,1] 45 32
Error matrix for the SVM model on test.csv [validate] (proportions):
Predicted
Actual [0,0] (0,1] Error
[0,0] 0.27 0.21 0.44
(0,1] 0.30 0.21 0.58
Overall error: 51%, Averaged class error: 51%
Rattle timestamp: 2016-08-08 15:48:15 user
ROC analysis for randomforest.
Confirms the above.
Conclusion.
Your set of predictors is hopeless.
Extremely curious.
The package is installed, documentation is available.
Maybe someone will try it and post the result?
Wouldn't that require a pre-screening?
Guys, take it!
Conclusion.
Your set of predictors is hopeless.
"post results in % (successfully predicted cases) for both samples (train = xx%, test = xx%). Methods and models don't need to be announced, only numbers".
We are waiting for more results. It is interesting what conclusions are obtained by Mihail Marchukajtes.
Okay)))) but read the conditions carefully -
We are waiting for more results. It is interesting what conclusions are obtained by Mihail Marchukajtes.
You don't need a test!
The model cannot be trained! You can't test an empty space.
