90%??

 

I have been using the Alpari M1 data to backtest with 90% quality. I decided that I would like to get better data, so I purchased some tick data and wrote a script to convert it to fxt based on the standard script on the MQL website. Now the generated fxt file from the Alpari data for 2 years on the EURUSD was 17,000 KB, the tick file generated from the script is 190,000 KB. This makes me wonder where the 90% figure comes from...the alpari file is about 8% of the size of the tick generated file. I had been finding that all my experts work great in the intervals they were optimized for, but bombed outside those intervals. Basically I was getting overfitting, even on 1 or 2 year intervals at 90% quality, which makes me wonder how effective '90%' quality data is. Anyway, I could have made a mistake, early days yet, forward testing will reveal all...

 
Craig:
I have been using the Alpari M1 data to backtest with 90% quality. I decided that I would like to get better data, so I purchased some tick data and wrote a script to convert it to fxt based on the standard script on the MQL website. Now the generated fxt file from the Alpari data for 2 years on the EURUSD was 17,000 KB, the tick file generated from the script is 190,000 KB. This makes me wonder where the 90% figure comes from...the alpari file is about 8% of the size of the tick generated file. I had been finding that all my experts work great in the intervals they were optimized for, but bombed outside those intervals. Basically I was getting overfitting, even on 1 or 2 year intervals at 90% quality, which makes me wonder how effective '90%' quality data is. Anyway, I could have made a mistake, early days yet, forward testing will reveal all...

Why not just use the build 200 data? It's superior to the Alpari data. If you do a compare of both, you will get more ticks modelled from the 200 data.

 

You mean the automatic download thing in the history center? If you do you need to check your data more closely, last time I checked (about 2 weeks ago) it was missing a whole month for this year. This was in fact the straw that broke the camels back with me and drove me to seek an alternate source of data.

 
Craig:
You mean the automatic download thing in the history center? If you do you need to check your data more closely, last time I checked (about 2 weeks ago) it was missing a whole month for this year. This was in fact the straw that broke the camels back with me and drove me to seek an alternate source of data.

Was it October or November? The history centre data from build 200 is only January 3, 1999 thru September 29, 2006. As far as I know, all data for that range of dates is there.

 

Just tried downloading again, the gap is from 2006.12.14 to 2006.9.30, backtesting on this is hardly going to be accurate. Even if it did not have the hole, your lucky if you are getting more than a tick every couple of minutes, I would not use this data on anything under the H4 timeframe.

 
Craig:
Just tried downloading again, the gap is from 2006.12.14 to 2006.9.30, backtesting on this is hardly going to be accurate. Even if it did not have the hole, your lucky if you are getting more than a tick every couple of minutes, I would not use this data on anything under the H4 timeframe.

Of course, that's exactly where the gap is supposed to be! They don't supply that data after September 30, 2006. The build 200 data is January 3, 1999 thru September 29, 2006 inclusive. To fill that hole ou either have to go without or get it from the broker you connect to.

What I did was I cleared all my history. Then I opened up an M1 chart and pressed the left button until all the data to September was loaded into Metatrader. Then I went into history and downloaded the build 200 data. Then converted to other timeframes.

Note that I can only get 90% model quality from April 1999 (or about, haven't tried other dates) to September 29, 2006.

Also note I don't back test with dates after Sept 29.

BTW, my EAs perform (much) better with Alpari data than with build 200 data. Using June 16, 2004 thru September 29, 2006 dates. That's why I stick to build 200 data.

 

I'm not sure I understand why the the data is only in a certain range, the subject of this gap has come up on other forums and nobody had an answer, but anyway...Have you tried your EA's on out of sample data, I found I was getting terrible overfitting on the Alpari data due it's rather 'sparce' nature.

 
Craig:
I'm not sure I understand why the the data is only in a certain range, the subject of this gap has come up on other forums and nobody had an answer, but anyway...Have you tried your EA's on out of sample data, I found I was getting terrible overfitting on the Alpari data due it's rather 'sparce' nature.

MetaQuotes is working on this gap issue.

Cheers,

Diam0nd

I LOVE

 
Craig:
I'm not sure I understand why the the data is only in a certain range, the subject of this gap has come up on other forums and nobody had an answer, but anyway...Have you tried your EA's on out of sample data, I found I was getting terrible overfitting on the Alpari data due it's rather 'sparce' nature.

Which one do you think has sparce data? I say it's Alpari. If you compare the number of ticks modelled in the two, then build 200 comes out ahead by quite a big margin

 

I think your missing my point, I'm not trying to deter you from using the download data and backtesting, I'm sure this is a useful exercise. What I am saying is when you compare 1min data to tick data in terms of volume of data you release there is a hell of a lot of information missing & that in my opinion (not fact, yet) this is leading to overfitting of EA params leading to poor out of sample performance.

Even eyeing up the 200 data, you get one bar per minute, if you have watched a chart for any length of time you will know a lot can happen in one minute! even more unknowns are introduced when 'fractal tick modeling' is in play. In response to your very first question, why don't I use the 200 data? I think you should re-read my first post.

 

Update, being doing some out of sample testing tonight, I have a couple of EA's which have about 1.7 profit factor over a year, trying them on the prevous years data now has them making small profits, as opposed to totally bombing.

I would take this as evidence that there is less curve fitting going on over the tick data, therefore my hope is that higher performing systems will do even better on out of sample data. I guess I answered my own question.

Reason: