Data for backtesting - page 2

 
Fernando Carreiro #:

If you are using MT5, then there is no need to use TickStory, and Tick Data Suite (TDS) is not even meant to be used with MT5, only MT4.

You don't even need to use Dukascopy data, nor is it ideal. Just use the data provided by the broker. If you use a reliable broker, they often provide quality historical data quite far back.

Using the broker's real tick data, also allows you test with real and floating spreads, instead of fixed spreads.

Don't be stuck in the MT4 mind-set. MT5 is different and you don't need 3rd party tools for testing with real and quality historical data.

Unfortunately, what you say is not valid in real life and in practice.  Unfortunately, many well-known brokers still have historical data of not more than 50% quality.

In summary, it does not mean that 3rd party data is never needed in the use of MT5. If needed, reference data such as ducascopy will be much more meaningful than the very low quality data of many brokers.
 
Cenk #: Unfortunately, what you say is not valid in real life and in practice.  Unfortunately, many well-known brokers still have historical data of not more than 50% quality. In summary, it does not mean that 3rd party data is never needed in the use of MT5. If needed, reference data such as ducascopy will be much more meaningful than the very low quality data of many brokers.

Forex is a decentralised exchange. The data from one broker cannot necessarily be compared to another, especially if they use different data feeds and liquidity providers.

So, how do you know if the broker's data is not more than 50% quality? Did you collect it over a long time to then compare it to the historical data?

 
Fernando Carreiro #:

Forex is a decentralised exchange. The data from one broker cannot necessarily be compared to another, especially if they use different data feeds and liquidity providers.

So, how do you know if the broker's data is not more than 50% quality? Did you collect it over a long time to then compare it to the historical data?

I hope everyone had a good Christmas.


I got in touch with Darwinex and they said that they have tick data going back around 5 years. The 1M bar data that I request from further back than that in the symbol manager (Ctrl+U) apparently comes from MetaQuotes.


I'm trying to balance accuracy with processing time, that is the issue here. I don't want to waste resources on something that will affect the results by 10% etc.


I'm interested to hear what other people do when testing a swing strategy trading on a M15 time frame over say 15 years. Happy to use tick data but just want to optimise the workflow if I can.

 
Fernando Carreiro #:

Forex is a decentralised exchange. The data from one broker cannot necessarily be compared to another, especially if they use different data feeds and liquidity providers.

So, how do you know if the broker's data is not more than 50% quality? Did you collect it over a long time to then compare it to the historical data?

If the data you have is of low quality and useless, it will not be an important issue for the data of other liquidity providers to be different from the broker.

On the other hand, the source you call other liquidity providers is ducascopy data, whose reliability has been proven all over the world, and which is used as "reference data" by many professional coders and developers.

If the data provided by the broker is significantly different from ducascopy datas, the reliability and transparency of that broker should already be questioned.
 
@Cenk #: If the data you have is of low quality and useless, it will not be an important issue for the data of other liquidity providers to be different from the broker. On the other hand, the source you call other liquidity providers is ducascopy data, whose reliability has been proven all over the world, and which is used as "reference data" by many professional coders and developers. If the data provided by the broker is significantly different from ducascopy datas, the reliability and transparency of that broker should already be questioned.

I will ask again, how to do you know the quality of the data? You can only know that if you collect it in real-time over a long period and then compare it again later with the historical data present by the broker at a future date.

Assuming that the data is consistent between collected and historical data, then even if the data is significantly different to Dukascopy does not mean it's "bad"- It simply means that it is a "dealing desk" or Market Maker or that their liquidity provider is different to Dukascopy. It  does not mean that the data is of low quality.

The "other" data feed or liquidity providers is not necessarily Ddukascopy. Their are literally many major banks and licensed investment companies and brokers that serve as liquidity providers all over the world.

Besides, if you plan to trade with "broker A" then it is a good thing to test your EA with data from "broker A" as well, and not just "Dukascopy".

 
EdFuk #: I hope everyone had a good Christmas. I got in touch with Darwinex and they said that they have tick data going back around 5 years. The 1M bar data that I request from further back than that in the symbol manager (Ctrl+U) apparently comes from MetaQuotes. I'm trying to balance accuracy with processing time, that is the issue here. I don't want to waste resources on something that will affect the results by 10% etc. I'm interested to hear what other people do when testing a swing strategy trading on a M15 time frame over say 15 years. Happy to use tick data but just want to optimise the workflow if I can.

You don't have to use tick data if your strategy bases its open and close prices based on M15 bar data. The resolution required depends on how your strategy works.

If it opens at the opening price of a bar, and closes also at the opening of a bar, then you can easily test against open prices only and be 99% accurate (there is always some slippage in live trading, so it can't be 100%).

If your T/P or S/L is instead a specific offset price and not necessarily the opening bar price, then you need more accuracy, but maybe even then M1 data may be sufficient.

Since, we do not know your strategy rules, only you can know how statistically important it is to have to use tick data or not.

 
Fernando Carreiro #:

You don't have to use tick data if your strategy bases its open and close prices based on M15 bar data. The resolution required depends on how your strategy works.

If it opens at the opening price of a bar, and closes also at the opening of a bar, then you can easily test against open prices only and be 99% accurate (there is always some slippage in live trading, so it can't be 100%).

If your T/P or S/L is instead a specific offset price and not necessarily the opening bar price, then you need more accuracy, but maybe even then M1 data may be sufficient.

Since, we do not know your strategy rules, only you can know how statistically important it is to have to use tick data or not.

Yes, which is why I'm planning to use 1 minute OHLC data.


The question is what do I use for the spread. tickstory uses maximum spread for the bar whilst MetaQuotes and Darwinex use minimum spread for the bar. I did a little test for myself to see the difference and this is what I got.


This is using 1 minute OHLC modelling.


The Darwinex results are those using the Darwinex bar spread which is the minimum from the bar. Although labelled spread=real it is not as this is not tick data.

The tickstory datat (grey line) use the maximum spread from the bar.

The 13.9 is the mean spread from the tickstory tick data which I calculated.


As can be seen there is a large variation and the tickstory data is significantly different to the Darwinex "real" spread (Grey vs red lines)


I suppose using the tickstory data (max spread) would be conservative.


Does anyone else have a different approach?


The reason I'm doing this is to speed up the process pipeline to idnetify suitable strategies. Once I have identified ones that may be suitable I will run them with tick data.


Effect of Spread on Equity Curve

 
EdFuk #: Yes, which is why I'm planning to use 1 minute OHLC data. The question is what do I use for the spread. tickstory uses maximum spread for the bar whilst MetaQuotes and Darwinex use minimum spread for the bar. I did a little test for myself to see the difference and this is what I got.

When you are testing against broker M1 data, you can set the spread in the Strategy Tester to a fixed value of your choosing (just like it was on MT4).

Click on the image below to see the GIF animation ...


 
Fernando Carreiro #:

When you are testing against broker M1 data, you can set the spread in the Strategy Tester to a fixed value of your choosing (just like it was on MT4).

Click on the image below to see the GIF animation ...


Sure, that is useful.

But I'm trying to figure out what a suitable value might be. Using a variable spread from either the broker or tickstory seems more appealing as it's on a per bar basis but maybe max is too conservative and minimum too optimistic.  Maybe I'm thinking about it too much.


Ed

 
EdFuk #: Sure, that is useful. But I'm trying to figure out what a suitable value might be. Using a variable spread from either the broker or tickstory seems more appealing as it's on a per bar basis but maybe max is too conservative and minimum too optimistic.  Maybe I'm thinking about it too much.
A suitable value might be the average, or the average plus/minus one standard deviation to compare the two to evaluate robustness.
Reason: