Why backtests are useless, EAs are flawed and their parameters are bad [DISCUSS]

Hi, I am Darwin and I am new here.

tl;dr In the next weeks, I am planing to release my private Walk Forward Analyzer, but before I do this, first I want to make sure that also the beginners in mechanical trading understand why common backtests are quite useless and why walk forward analysis is better in so many ways, so I decided to start this discussion.

I WILL RELEASE THIS TOOL FOR FREE AND OPEN SOURCE!

And as people keep asking why I release free stuff and why I post this thread on multiple forums (actually, I want to post it on all of them, so tell me if I have missed one), let me explain my motivations:
I am targeting a job in the trading-economy, and this is not easy to get. Beeing known in the community would help me a lot to accomplish this goal, so I decided to release some of my private tools for free and/or open source to make myself a name over the next months. I am not here to sell you stuff, keep that in mind, please!

For the record: I do not claim that WFA is the holy grail in system testing, but I argue that it is way better than normal backtests.

THE GENERAL APPROACH TO SYSTEMATIC TRADING

Take a trading logic => Generate an EA that trades exactly like it => Optimise Parameters => Do a backtest => Trade it live => $$$

BUT: Everybody that has ever had an EA that was a money-printing machine in the backtests (and this is very, very easy to be done), knows that a good backtest does not imply profitable live trading.
And that is the problem. We need a test-method on whichs evaluation we can rely!

STATEMENT 1: Your parameter choices are not good

Well, every EA consists of 2 things:

  1. the logic/script itself
  2. the parameters (like periods for moving averages, stop loss values etc. Just everything that can be adjusted!)

The first thing is static and “given”. And a lot traders only focus on this part.
But, an EA can behave in very different ways, depending on the parameters. And there can be billions of possible parameter-combinations (=parameterset). There are 3 approaches to determine the parameterset for live trading.

1. the intuitive approach (non adapting):
The trader just chooses the parameters based on his expert knowledge.
But, there are just too many possible parametersets. You can’t just “guess” them without evaluation and testing.
It does not matter how good you are, you are never good enough to “think through” this huge amount of possibilities in a reliable fashion!

2. the “optimise on all data” approach (non adapting):
The trader chooses the parameters based on all past data.
So, the configuration/parameters that worked best for the last 12 years, for example, are chosen to be traded live.
But, to stay up-to-date it is not a very good method to optimise the parameters on so much and therefore OLD data.
Because the market today is not the same as it was 12 years ago.

3. the “optimise on the last few years” approach:
The trader chooses the parameters based on the last few years of data.
But you can not just take this approach without testing how this “optimisation method” would have worked in the past.
And this is exactly what a walk forward analysis will do: “Optimise on the last few years”, but it tests this approach on all data in the past!

STATEMENT 2: A non-adapting EA can never make longterm profits

I mean, there is an infinite amount of possible trading-systems AND an infinite amount of ways the market can change AND an “infinite” amount of possible parameters for your trading-system.

Considering this, would you really want to bet money on the fact that you have a trading system that will always work, in the same way, without adaption, on all future market conditions? I would not.

"But why shouldn’t I be able to put a traders knowledge into a script and trade it?"
Its simple: A trader always learns, he takes input from many sources, he has knowledge about the markets and therefore he adapts his trading strategy so he can always be as close to the markets as possible.
A simple EA script can’t do this, a Walk Forward Analysis can (to some extent).

Reason #1 why there are many “profitable” EAs out there
The EA really works, it has a sound strategy and was developed properly. But the markets always change, and they can change in infinite ways.
So, at some point, the traded market-inefficiency WILL(!) change, and the EA can not adapt to this change and therefore will lose its profitability.

Reason #2 why there are many “profitable” EAs out there
is that some of them, sound or unsound systems, are just lucky.
If you send 10.000 people to a casino and let them play for a while, just due to chance, some of them will make profit over some time.
The same is for EAs, if there are enough of them, some will really make profit (even if they are in fact useless).
But as they keep trading, they will lose, as the probabilities are against them (same as in a casino).

Reason #3 why there are many “profitable” EAs out there
is the huge risk some EAs take (grid trading, martingale systems etc).
These EAs take very huge risks to make small profits. But somewhen, the risk strikes, and at that point, the EA will to lose (all) money.

Reason #4 why there are many “profitable” EAs out there
is the small timespan on which they are profitable. Its no magic to make profit for months, but its hard to make it for many, many years.

STATEMENT 3: A backtest does not tell you anything about the future performance

Well, don’t get me wrong, I strongly hold the opinion that simulations on the past are the only way to really test a trading system.
But what does a backtest tell you? Just that your system performed well on the past.

But is trading really about having a system that represents the past?
No, trading is about designing a system on the past and then trade it in the future.

THIS IS A FUNDAMENTAL DIFFERENT QUESTION THAN WHAT A BACKTEST CAN TELL YOU!
You want to answer the question “how good is my live trading performance” with the answer to the question “how good did my system perform in the past”. That logic is flawed, of course.

"But if I use out-of-sample data to verify my backtest…"
Out of sample testing is a good idea. But you only have one optimisation-dataset and one test-dataset, which is not very reliable. Walk forward analysis, somehow, is out-of-sample testing on steroids. It uses the same method, but generates 10-1000 opti/test dataset-pairs.

"So, are all backtests useless?"
No, it can help you to get sound and good trading systems, if done right.
But my point is that a good performance in a backtest does not make sure the system is also good in live trading.

Trading is a game of probabilities, and the probability that an EA is profitable, only based on a good backtest, is very, very low.

"But why should a system that performed good over the past simply stop working"
The main reason is, and that might be the most important one, OVERFITTING.
That means, the EA tackles an “random” inefficiency of the market that was just there in the past, but that was no real thing.

I found a very good picture that explains overfitting:

IE MARKETSHARE VS MURDER RATE

You see? There are infinite of such useless “relationships” in the price data, and most EAs tackle one of them which gives them very good backtest results, but that was just due to luck, and these “relationships” will not hold in the past.
And as there are a lot more of these overfitted solutions than sound ones, the chances are very high that you optimised towards such an overfitted system.

The other reason is that the markets always change. So even if you have a non-overfitted, good performing backtest, you will run into serveral problems that you can read in Statement 2

THE SOLUTION: WALK FORWARD ANALYSIS

A Walk Forward Analyser is an external program that can take EVERY EXPERT ADVISOR and, using Metatrader4, do the analysis.
SO YES; YOU CAN CONTINUE TO USE YOUR ALREADY WRITTEN EAs!

Well, some of you will already know how a walk forward analysis works. For the others here is a short overview, as I will explain the process in more details in a future thread.

A WFA takes, for example, the data from 2000 to optimise your system, then tests it on 2001 (which is, from the point of optimisation, the “future” or “live trading”).
Then it walks forward and optimises the system on 2001, test it on 2002. Then optimise on 2002, test on 2003 etc.
Do this until you walked through your whole data and only consider the “live trading” in your evaluation.

You see? With this simple tactic you can tackle the 3 problems I have described above, the lack of adaptability, the need for a evaluated parameter selection process and the uselessness of a “past-performance”-backtest.

And then, when you want to trade the system live, you do the exact same thing that you have tested:

  1. Optimise on the last available data, with the same optimisation settings you tested in WFA
  2. Trade it in the future, with the same procedure you tested in WFA

And that is the main difference to normal backtests: YOU TRADE YOUR EA IN THE EXACT SAME WAY AS YOU TESTED IT!!!

So, this is what I wanted to say, now feel free to tell me what you think about this topic, please :slight_smile:

  • Darwin

I am just as skeptical of Walk Forward Analysis as I am of backtesting. I must admit I have little knowledge and no hands on experience with WFA, but plenty of backtesting experience. My main concern is the data quality. That is a huge problem with backtesting which you didn’t metion. I suspect the same will happen in WFA - garbage in, garbage out. I am hoping to see your comments in future posts about data quality.

Even with WFA, data quality will play a large role in the results. Like you have already stated: garbage in, garbage out. In my opinion, WFA is necessary after one has used the optimization process to determine the “correct” settings. WFA in this way, acts as a check to ensure the settings are not curve-fitted. Besides that, there really isn’t much reason to opt for a WFA over a regular backtest.

What about data quality? Are you reffering to the mt4-default-history data which has huge gaps?
Well, because if you have good enough data + use high enough timeframes (>= H1), there should not be a problem with data quality, or do I understand you wrong? :slight_smile:

No, that is the worst way to use WFA, because then you already “pre-curvefitted” your parameters. (which means you have all data already IN your parameters).

Then, when you do a WFA and optimise a new parameterset (a “child” of the pre-fitted) and tests this new parameterset “in the future”, this “future data” is already IN your parameterset as you used it when you “pre-(curve)fitted” your parameterset, making the whole WFA process useless :slight_smile:

And yes, there is a very very fundamental reason to opt for a WFA over regular backtests, because as I said, it will do out-of-sample tests ONLY (fitting on some data, then testing on different data “in the future”).

A backtest does in-sample tests (fitting on some data and then testing on the same data, that is useless).

This difference is so fundamental like comparing a BMW to a Lada :wink:
But please, tell me what is still unclear :slight_smile:

-Darwin

Of course I am referring to the broker history data. That is what about 90% of the posters to this forum (and FF) use judging by their comments. And I dare say that of that group, only a very small percentage is aware of the problems in the data. So, keep in mind these are the same people who are reading this thread.

As far as acquiring good enough data, that is a formidable challenge for someone who trades the 15M charts like I do. So I guess I was right to be skeptical of WFA as being useful to the retail trading community but maybe I can learn the theory for my benefit.

I think you are misunderstanding what I am trying to say.

The entire WFA process use optimization in the lookback period to find the “better” settings and then it is run on new data to validate them. The process is then repeated a number of times and each time new settings will be found. This acts as a check to see if the “edge” that the strategy is capturing is in fact valid and not just random.

I think we’re just describing it differently although trying to convey the same things. :stuck_out_tongue:

Darwin , you mentioned “I WILL RELEASE THIS TOOL FOR FREE AND OPEN SOURCE!” where can i get a copy ? Thanks in Advance

I would not say that WFA is of limited use for the retail trading community, its just of limited use for such small timeframes. :wink:
But that is not about your data, its just not possible to reliable backtest/walk-forward-analyse anything on such a small timeframe as things like slippage, broker dependent tickstreams, network lags, interpolated ticksimulations and so on will mess up your simulation results.

Thats just the price you pay if you want to trade on small timeframes, you will have to do it manually ;(
I do not say trading this with an algo is impossible, no not at all, just that it is a lot lot easier to run them on >= H1, and then the average data quality is good enough (if you have M1 data without gaps).

-------------------------------------------------------------------------------------------------------------------------------------

The entire WFA process use optimization in the lookback period to find the “better” settings and then it is run on new data to validate them. The process is then repeated a number of times and each time new settings will be found. This acts as a check to see if the “edge” that the strategy is capturing is in fact valid and not just random.

I think we’re just describing it differently although trying to convey the same things. :stuck_out_tongue:

I think you are right. :slight_smile:
And as long as you don’t use all your data to optimise some parameters and then put them into an WFA and treat the formerly used data as “unseen/out-of-sample” it’s all good.
Otherwise you will run into dangerous territory.

-------------------------------------------------------------------------------------------------------------------------------------

Darwin , you mentioned “I WILL RELEASE THIS TOOL FOR FREE AND OPEN SOURCE!” where can i get a copy ? Thanks in Advance

Hey, yes, I will release it in ~1 week (at least that is the plan)

I have to agree to all statements by DarwinFX. I think this is also the experience of all serious trade system developers. A system developed without WFA is useless, except for rare cases where parameters are not important.

Only problem is that even WFA does not guarantee a profitable system. Systems exploit market inefficiencies that can disappear any time - and this does really happen more often than you think. WFA can not determine the lifetime of a system. The mere process of developing a system and finding an edge already constitutes an in-sample preselection and reduces the quality of subsequent tests, even WFA.

Hey, that is a good point and I have to agree with it - thanks for pointing that out :slight_smile:

At the moment the only way to recognize this kind of problem is to watch the EA while trading (and to carefully analyse the fitness values of the optimisation procedure).

Tough, I am working on methods to recognize and handle this kind of problem in an algorithmic and automated fashion, but these methods are still only ideas as I am busy doing other stuff.

Any thoughts on this, any good ideas how to recognize these problem? I am open to all suggestions :slight_smile:

-Darwin

I think the best way to write EA’s is to only use “on new candle” triggers. So for example, on a new 1H candle, the algorithm is checking if a trade needs to be entered or stopped. So you only use the Open and Close prices every hour instead of using tickdata (which is always not to be trusted). I’ve programmed an EA that does exactly this and the backtest result using “control points” and “every tick” (using 99.9% Ducascopy tickdata) are exactly the same (also running one year live now with exactly the same results if I backtest it from starting 1 year back). If I try to program an EA which open and closes trades ‘on the fly’ (for example: when price is higher than X), the backtest result for ‘control points’ and ‘tickdata’ are always VERY different, and so unreliable.
So it is possible to program an EA and backtest it reliable, but you need to only use candle open,close,high and low prices. The higher the timeframe, the better.

You talk about opening trades on the open of a new bar. I think most EAs do that. But when do you close a trade? Only on the open of a new bar as well? Most EAs do not do things like that. They usually have some fixed SL/TP or TS. How can you trust results unless you use tick data?

EDIT:

Would care to comment on testing with data gaps? Sometimes there are a few missing 1M bars, sometimes 20 or more. They might occur in the middle of a 1H bar or span the close/open. Unless the data is thorougly cleansed and validated, you cannot trust the results from backtesting.

Don’t get me wrong: I use tickdata to do the backtesting, but as you mention yourself, most data has gaps on the 1minute timeframe, so if you only use the closing prices of the 1h candles or above, you don’t have this problem so much, because I believe the data of 1H “open,close,high and low” prices are much more accurate. Of course, not all strategies are possible to convert to only using open and close prices for entry and exits. I think especially scalping-strategies have major problems with backtesting-results being unusable. You can never be certain of the results of the 1minute movements between each new 1h candle, so I create strategies that don’t need this 1minute data

I agree that backtesting for scalping is useless.

I don’t agree with your comment about the 1H bars. They are made from 60( or less) 1M bars. D1 data might avoid the problem with missing data, but not 1H.

Why did you not comment on exiting a trade?

I exit the trade also only on bar close as well as trailing stop etc. Are the 1H open,close,high and low received from the broker determined by the 1M data? Don’t they keeps these open,close,etc. data seperately? Didn’t know that. Anyhow, I’ve been checking live trading and backtests for the last year, and they are identical. Of course, the data that I’ve used for this backtesting, is the same data that I’ve been receiving during that year of live trades…but I also backtested it with downloaded data from another broker, to see the difference… and it was 99% identical. Backtesting will never be 100% accurate, because it is almost impossible to emulate the variable spread,slippage, etc, but I think it is possible to backtest an EA to such a degree that you can trust it will also be profitable trading live, with some minor differences of course. In the end, what we want is a EA that is consistent profitable, and if it is, it doesn’t matter if it is 1 or 2% less profitable than the backtests.

Of course, the only real test, is the live forward test… but you can never test, retest and optimize using that method, because it would take years, even dozens of years, before you get anything usefull… So backtesting, with its faults is not perfect, but I think very usefull.

I can only agree with that - only trade on bar open.
But not just because of backtest<->livetrading concistency, but also becasue that way you can carry out “Open Price Only”-Backtests, allowing you to make about 500.000 single backtests in 24 hours (with heavily patched metatrader4 instances and an algorithm doing the tests, of course not manually).

And beeing able to generate and analys such a massive amount of data for a system is a very very huge benefit compared to “Every tick”-EAs :slight_smile:

Regarding the Gaps in MT4-Data: You can use MT5 to download complete data and then convert it!

Regarding the problem of missing M1 candles: It should not be such a problem as they are not made up of 60 candles, but more of 4 concrete values out of these 60 candles. If the High, Low, Open or Close M1 canldle is then missing, it might introduce a small bias, of course. But it should not be a problem regarding reliability of a backtest carried out on these data :slight_smile:

-Darwin

How do you avoid over optimisation?

The biggest issue with data mining i.e. running 1000s of parameters sets is that often the best strategy is the one that arbs a deficiency in your back test. This could be something like what codemeiser pointed out like not getting stopped out properly as you are only using 1H bars, scalping strategies normally get eaten by the backtest not having spreads properly incorporated. What ever this issue with the back test is, a hardcore optimisation session will always find it and make you think youve hit the jackpot.

True, therefore algotrading is not about optimisation, it is about using algos as tools to look at the whole picture at once.

If you split your dataset into many sub-datasets (100 in-sample & out-of-sample pairs, for example) and then you evaluate all possible parameter-combinations of your system on each of them (optimise on in-sample, evaluate all candidates with profit > 0 on out of sample), you suddenly have something like 100.000-500.000 single datapoints.

You can then use this massive amount of data, that should incoperate all information you can get on this system, make some cool diagramms out of it and then you can easily spot the characteristics of your system - without any optimisation.


X-achsis: Profit in in-sample.
Y-achsis: Profit in forward trading.

This incoperates 250.000 single in-sample/out-of-sample tests, done on the “Moving Average”-Ea that is shipped with every MT4 installation.

You see the clear trend? That is an in-depth view that no manual trader can EVER get on a trading system, as only algorithms are capable of generating and analysing such an amount of data.

Also, most people would dismiss candidates with very high profit as “overfitted” (I had the same opinion on this), but data clearly shows that is wrong. The more profit in optimisation, the better in live trading - for this particular strategy at least.

Optimisation comes into play when you say to the algorithm “find the optimum of XYZ”. But if the algo just harvests and preprocesses ALL of the data, and then you, as human, spot the optimum yourself, overfitting is prevented as there is no fitting, just an evaluation of all data.

-Darwin