Preventing overfitting

 

I read the article

https://www.mql5.com/en/articles/652 - MQL5 Cookbook: Reducing the Effect of Overfitting and Handling the Lack of Quotes

and actually tested  some of its results myself, but they are not especially overwhelming (one of the assumptions made in the article regarding the lack of need for forward testing is, in my opinion, directly wrong). So I was wondering, what kind of techniques can be used to prevent overfitting? 

MQL5 Cookbook: Reducing the Effect of Overfitting and Handling the Lack of Quotes
MQL5 Cookbook: Reducing the Effect of Overfitting and Handling the Lack of Quotes
  • 2013.09.24
  • Anatoli Kazharski
  • www.mql5.com
Whatever trading strategy you use, there will always be a question of what parameters to choose to ensure future profits. This article gives an example of an Expert Advisor with a possibility to optimize multiple symbol parameters at the same time. This method is intended to reduce the effect of overfitting parameters and handle situations where data from a single symbol are not enough for the study.
 
jlwarrior:

I read the article

https://www.mql5.com/en/articles/652 - MQL5 Cookbook: Reducing the Effect of Overfitting and Handling the Lack of Quotes

and actually tested  some of its results myself, but they are not especially overwhelming (one of the assumptions made in the article regarding the lack of need for forward testing is, in my opinion, directly wrong). So I was wondering, what kind of techniques can be used to prevent overfitting? 

Don't optimize at all.
 
I understand that one is to reduce the number of parameters to be optimize to minimum. But I might be wrong!
 
Ok. Based on my EA design, I optimise the input in a range which is logical and not over a big range.

Logical in the sense where how it should behave.

Optimizing over a big range is curve fitting. 
 
doshur:
Ok. Based on my EA design, I optimise the input in a range which is logical and not over a big range.

Logical in the sense where how it should behave.

Optimizing over a big range is curve fitting. 
Sorry but I don't understand what you mean by a "big range", can you give an example ?
 
yes I don't understand too, if you can be more precise I'll be grateful!
 
Ok. For example if my strategy is to sell/buy when daily range exceed certain pips. So I made use of atr in daily timeframe. When optimising I input values from 80 to 120 percent of the daily atr. I don't go optimising from 10 to 300 to find the curve fitting input parameters

This might not be a good example but somewhere there along the line.
 
RaptorUK:
Don't optimize at all.

That sounds like killing a bug with a bazooka, doesn't it? I mean, instead of finding a reasonable solution, you just do not use an available tool... Also, if that is the answer, why does almost every presented EA in the market provide optimization results?

 

Candles:
I understand that one is to reduce the number of parameters to be optimize to minimum. But I might be wrong!

I absolutely agree. In fact, what I'm trying to do is obtain the value of different parameters tuning just one. For instance, the slow period of MACD by fast period*2, but I do not know whether this is the right path

 

doshur:
Ok. For example if my strategy is to sell/buy when daily range exceed certain pips. So I made use of atr in daily timeframe. When optimising I input values from 80 to 120 percent of the daily atr. I don't go optimising from 10 to 300 to find the curve fitting input parameters. 

This might not be a good example but somewhere there along the line.

Again another logical answer

 What about the optimized history data range? If I'm trying to develop an EA which should be up and running without modifications for a month, what should be the amount of history data used for its optimization?  

 
jlwarrior:

That sounds like killing a bug with a bazooka, doesn't it? I mean, instead of finding a reasonable solution, you just do not use an available tool... Also, if that is the answer, why does almost every presented EA in the market provide optimization results?

Isn't it obvious ?  it's because they are all curve fitted.  

OK, I admit my response was an extreme one,  but I think it would be better to take my extreme,  no optimization,  over the other extreme of optimize by as many inputs as possible even if it makes no logical sense to use them for optimization.

Look at it this way,  optimization is just that,  optimizing something that is already good to make it even better.  It's not meant to make a losing strategy win . . .
 
When u use logical optimization, you will find that the input values does not change so much no matter how long the period is.

Then u will realise your ea is very stable and not curve fitted. 
 

Nice Article. I agree with RaptorUK first comment "Don't optimize at all". Otherwise, you'll run into problems explaining how to properly optimize. Most of us would agree that curve-fitting exists. However, most would also dis-agree upon how to avoid it.

Someone can take a simple SMA_CrossOver Price, create allot of Optimization_Switches for example Periods, BarCounts, Stoploss, TakeProfit, TimeOfDay, etc. Each of those can end up having about 100_steps. Thats 100*5 [ or probably 100 square ] different results with just the few above, and voilà, you have a profitable sma strategy which more than likely wouldn't be profitable in live testing.

There are soo many different material on how-to optimize that then again brings me toward the KISS principal.

  • Like doshur suggests, you can try optimizing with small_range from where you started.
  • Some people might suggest larger steps, instead of stepping by 1, use 10, or 100 etc.
  • Some people might suggest you optimize on a small date_range size then walk-forward on large date_range.
  • Some people might suggest you optimize on a large date_range size then walk-forward on small date_range. 
  • Some people might suggest you optimize on all available data.
  • Some people might suggest you need a large number of trades.

What method someone uses, usually depends on their forex world_view. One of the popular view on optimization is bullet#4 where you optimize for about 4-months, and then run a back-test for the next 2-weeks or so following the optimization end_date. Obviously someone using such method believes that market behaviors are short_live and needs constant optimization to stay on track. And thats just one example of forex world_view.

What I believe about optimization is this [ should don't optimize not work for you ]: I'll have to use an example with blackjack simulators to illustrate. Lets say I wanted to know a good value to take insurance while i was counting cards. So i simulate a couple of million hands of blackjack and the results comes at follows.

0__ 

1_______

2______________ (best count for insurance because it has the longest bar)

3_______ 

4__ 

As you can see, 2 gives the best result because it has the longest ___bar___ or number of wins at that count. But whats important here is that number 1 and 3 neighbors #2 in being second bests. Followed by 0 and 4 on both wings. This idea of a cluster is what give me confidence to select 2 as optimal.

 

However, with most forex optimizations, the results usually don't create that curve shape with a peak.

0__

1______________ 

2__

3______

4______

Clearly number1 creates the best results but this doesn't inspire much confidence from me.

 

To summarize, don't optimize, observe the market or think up a strategy, code it and should work on the first run. Otherwise think up something else. Should it work on the first run, then look into optimizing taking into consideration the cluster effect. Use all available data for testing, accept any weakness and move forward.

Reason: