Backtesting and optimization are valid instruments?

Hi folks,

I’m quite new to the fx world and I’m try to learn and experience as much as possible. I developed a set of EA which I backtested (I used Alpari data) and optimized for several pairs. The results of these simulation are quite good but, from a pure mathematic point of view, how accurate and reliable are data coming from backtesting and optimization?

The market prices’s changes are caused by several causes and many of them are fundamental-related. Moreover, when I’m optimizing an EA, how can I trust that I’m really getting optimized parameters and not only curve-fitted parameters? In fact there is always a number of parameters which lead the EA to be super-profitable.

I’m writing this post since, as I said, I’m approaching to this world and before spending a lot of time and effort optimizing and backtesting EA just to see them failing miserably in demo (or live!) I would like to have the opinion of expert and experienced guys who already faced my issues.

Thanks for your help.

Backtesting is almost useless and the main reason is the historical data that most people use. The old saying “garbage in, garbage out” certainly applies in this situation. As well some strategies (basket trading, scalping) can’t be backtested, so be sure you understand the subleties before wasting your time.

Hi CodeMeister, thanks for replying. When you say:

Do you mean that with good quality historical data is it possible to get something useful from backtesting? I’m more concerned about the mathematical reasons behind it: how can you model something that is driven by reasons totally outside mathematics.

Totally agree on the fact that certain strategies cannot be backtested…

Hmm… I can’t speak from personal experience because when I realized the effort and expense involved
to acquire proper data, I didn’t pursue it. I have read other people’s opinions which are generally positive about the merits of backtesting. Perhaps you should review other threads, this subject has been discussed many times.

EDIT: this is a good place to start

http://forums.babypips.com/expert-advisors-automated-trading/52287-tutorial-complete-backtesting-analysis-setup-100-free.html

Thanks for the link. It looks like the shorter is the timeframe (e.g. 1 min) the more important is to have accurate data.
Perhaps with EA acting on 1hr or 1day timeframe getting good data is easier.

What do you mean when you say:

Did you stop backtesting/optimizing or you accept what you can get easy and free?

Not really. How are you going to cope with a fictitious spike in the data? Some are 100+ pips and easy to spot. The 20 pip spikes aren’t. A more common problem is missing data. Again hours of missing data is easy to spot, but not 10 minutes. Of course once you detect these deficiencies, the challenge is to correct them or somehow skip over them.

I think backtesting is useful for 2 purposes even with crappy data.

  1. To quickly learn a new strategy.
  2. To verify the correctness of indicator and EA code.

To use backtesting as a means for optimization is useless.

I rarely perform optimization when I am developing systems. The risk of curve-fitting is a lot higher than what most people would think.

If you want the most accurate testing results, tick/1-minute data is what you want. As for gaps and invalid spikes caused by technical errors instead of actual market spikes, there are ways of filtering the data. You can also find many different sources of data and fill in gaps by combining the multiple sources. As far as the procedure to do that, that is 100% up to the trader/systems designer to figure out. A good start is this PDF: http://www.tickdata.com/pdf/Tick_Data_Filtering_White_Paper.pdf.

Do you have any text/book which can give any advice of the risk of curve-fitting?

Honestly, I feel quite confused: I created several EAs which trigger trades on the base of common indicators (Stocastich, MACD, SMAs) and, up to date, I optimized them on the base of the historical data. Now you are telling me that this is useless. I can understand it but I’m curious to know what are you guys doing: no optimization at all? Backtesting only to gather confidence with the EA?

And moreover, my EA’s are based on 1hrs timeframe, do you think is really necessary getting the tick/1-minute data or I can simply rely on the alpari 1hr data? I do not want to appear lazy here, I’m just considering time and effort one needs to spend compared to the real improvement one gets.