Walk Forward Analysis - the only logical successor to backtesting [DISCUSS]

Hello,

I’m Darwin and today I want to explain how a walk-forward-analysis works and why it is the only logical way to analyse EAs.

I know that the title is a bit provocative, but it’s often easier to start a discussion with a controversy.
Also, I know that the best EA-traders can use normal backtesting and still be profitable - but most can not. And even if you can, a WFA is still better.

I WILL RELEASE A WALK-FORWARD-ANALYZER FOR FREE WITHIN THE NEXT FEW DAYS THAT DOES ALL OF THIS 100% AUTOMATED; UNTIL THEN, USE THE TIME TO MAKE YOURSELF FAMILIAR WITH THE CONCEPTS.

Nevertheless, none of this article is needed to use the walk-forward-analyzer tool, it will be as easy to use as the metatrader4-backtester.

After reading, you have 2 choices:
Agree with my arguments and once and for all get rid of the flawed backtesting-approach and use WFA in the future.
Disagree with my arguments, but then please try to argue with me… Do yourself a favour, don’t just stick to backtests because you “know” them or something like that.

Also, please read my first article: “Why backtests are worthless, fixed-logic-EAs are flawed and your parameters are bad [DISCUSS!]” (here on the forum)

Initial Situation

The 3 parts of each trading-system.

1. The system’s logic
The most obvious part! And for a lot beginners it’s the only part they know, which is dangerous.
This might be a manual trading system or an expert advisor or any other form of fixed trading-logic / trading-system / trading-strategy (btw: all 3 terms name the same thing in this article)

But you all know that every strategy has some kind of variables/parameters (like the periods of moving averages or SL levels etc), that are NOT FIXED(!) but can vary, which brings us to the second part.
(If you just set them to a fixed value because “this should work”… well, it wont, at least not in the long term)

2. The system’s parameter-ranges
The ranges of the parameters are an ESSENTIAL part of every trading-system, as they determine the exact behaviour of it (but the trading-logic always stays the same).

So, a moving-average-period might range from 5-15 to capture short-term price movements.
It is not 6 and not 11, it is 5-15, as the markets change, we can’t choose a concrete value, ALWAYS a range!

3. The market, the amount of data, the desired characteristics
Every strategy trades on a market, so we want to determine on which. (eg EURUSD / H4)

But thats not enough, we also have to determine how many past-price-data we want to use to evaluate our possible parameter-choices.
Because, as I said, a system always has parameter-ranges, but for live trading we have to choose concrete values!

And we do this by evaluating all parameter-possibilities on the last X years of price-data, and after evaluation,
we end up with a huge list of possible and “independent” trading-systems (each of them with different parameters, but the same main-logic).
And each has it’s own characteristics like “profit” or “profit factor” or “relative drawdown”.

So, we also have to determine how to pick the “best” parameters.

But it’s not as simple as saying “I want much profit”, because the characteristics often don’t hold in the future!
Instead we want to choose in a way that gives us a high probability of picking parameters that will succeed during live trading.

Simple, isn’t it?

An illustrating example

The system’s logic:
Let’s suppose a very basic trading-system: “If the price moved more than X pips in the last Y days, a course correction will happen”
(just thin air!).
The parameters would be X and Y in this case.

The system’s parameter-ranges:
I chose X to be 100-200 pips in this example, and Y to be 2-3 days.

Amount of data & prefered characteristics:
Let’s use 10 years to evaluate the possible parameters on, and “profit” as prefered characteristic.

The process:
Ok, now before we can trade that system, we make an optimisation on the last 10 years.
That means we backtest every possible parameter-combination for our system and choose the best in terms of “profit”.

For sake of simplicity, here is a cropped example:

“If the price moved more than 100 pips in the last 2 days, a course correction will happen”
=> 1000$ in the last 10 years
"If the price moved more than 150 pips in the last 2 days, a course correction will happen"
=> 1200$ in the last 10 years
[…]
“If the price moved more than 150 pips in the last 3 days, a course correction will happen”
=> 950$ in the last 10 years
"If the price moved more than 200 pips in the last 3 days, a course correction will happen"
=> 950$ in the last 10 years

Soo, According to our prefered-characteristic (profit), we would choose X = 150pips; Y = 2 days, and then just trade the strategy.

How to ask the right questions

Ok, now that I have described the current process and how it is all done, here comes the “new” part.
The goal itself always stays the same, we want to pick the best parameters (based on some kind of evaluation on the past), and then we want to trade live!

Remember: The only thing a backtest can tell is “How good did my system+parameters perform in the past”.
But that is NOT what we want to know! Be sure that you really understand this.

Initial Question; What we actually want to prove with analysis.
“Does the way we choose parameters for live trading (‘pick the one with best profit over the last 10 years’ in the above example) give us a high probability to pick parameters that are profitable during live trading?”

So we are actually interested in the relationship of past-performance&future-performance, not backtest-results!!

If the answer is No, one or more of the 3 things described in “Initial Situation” are wrong. Might be the logic itself, the parameterranges etc…
If the answer is Yes, the performance in the past and the performance in the future are somehow correlated for our EA, and we can trade the system!

The logical evolution; From Backtests to Walk-Forward-Analysis

Step One: Backtesting - in it’s worst form

Pro:
[ul]
[li]We will get parameters that performed well on a wide range of data
[/li][/ul]

Contra:
[ul]
[li]Every single EA trader that used this method and then tried to trade an EA live, based on good backtests, can tell you: It just does not work this way.
[/li]

[li]Overfitting / Curvefitting!! We first optimise the parameters, and then test them, all on the same data.
[/li]That means we have no clue if we have valid parameters or overfitted ones.

Overfitting means, we optimised towards a random behaviour within our data, that just exists in this particular dataset, and will not hold in the future.

That means, we captured a relationship that existed but was not a sound one. Like this:
http://s1.directupload.net/images/131125/5ccypnb8.jpg (sorry, can’t include >4 images…)

Don’t fool yourself in thinking “ah, this wont happen”… Almost all “relationships” within the markets are like this, as most price-movements are random!!

If you do not understand overfitting, google for more information, as it is our archenemy in mechanical trading.

[li]No significance for future performance!! Remember that the initial question is not how good our parameters performed on the past, but how high the probability of succes AFTER the optimisation-timespan (so, “in the future”, during live trading) will be.
[/li]
As we did not do any tests with our parameters that take into account the relative future, we did not even try to answer the initial question.
We just answered the question “How good did our parameters perform in the past”, not taking into account anything about the “future” => very bad.

[li]Even if you could somehow magically invalidate my above points, because the parameters worked well on a huge amount of data, they are not really the best for the current market - just average good on all market-conditions.
[/li]

[/ul]

Step Two: Backtesting using unseen/out-of-sample data

Notice: The first dataset, we use to optimise our parameters on, is called “in-sample” (is). The second, unseen, dataset is called “out-of-sample” (oos).

Pro:
[ul]
[li]We now have a lower chance to get overfitted parameters, as we use an independent dataset to validate our parameter-choices.
[/li]

[/ul]

Contra:
[ul]
[li]Due to the infinite amount of senseless/unsound relationships within the markets, we still have a (too high) risk for overfitting, as chances are too high that we just got parameters that are valid (curvefitted) on both datasets, but not valid in the future.
[/li]

[li]If the system did not work in out-of-sample and you then begin to tune your parameters until you get good oos-results, your oos-results are not longer “unseen” and becoming “in-sample”, which makes the whole approach using 2 datasets useless!
[/li]

[li]We still use a very larg part of our data (in-sample) to find the best parameters, which also means we use a lot “old” data. That is not a good decision as the behaviour of the markets in the past is not equal to the behaviour of today.
[/li]

[li]Not just our in-sample dataset is too huge, also our out-of-sample dataset is too huge and therefore un-realistic. In the example above it would be a few years, but would you really like to trade a system for years before choosing new, re-adjusted, parameters? I would not!
[/li]

[/ul]

Step Three: Backtesting using a more realistic data-amount

Pro:
[ul]
[li]We now only use the recent market-behaviour to optimise our parameters, so we capture the market “at the moment”, and not “10 years ago”.
[/li]

[li]We not test our parameters on a timespan that is more realistic (as it is not years but months!)
[/li]

[/ul]

Contra:
[ul]
[li]We only used a small part of the available price-data for our tests. This is not very efficient!
[/li]

[li]Ok, remember the initial situation, where we have settled on parameter ranges, amount of data to optimise on, and the “desired characteristic”. Our analysis has the purpose to verify these choices, wether they are valid or not.
[/li]
But in this case, we only made one test with them, so we optimised on one part of the data, then we chose 1 parameter-combination and tested it on 1 “unseen” dataset.

Facing the million/billion possible parameter-combinations an EA can have, and the infinite ways the markets can change to generate new and “unseen” behaviour, do you really think that 1 test, 1 datapoint, 1 past->future relationship, is enough to judge from? Of course not! So why are you still using normal backtests?

[/ul]

Step Four: Walk Forward Analysis

So, as you might see, a Walk Forward Analysis is the same thing like doing a normal back- & out of sample-test, but we do it over and over again, so we end up not just with 1 test-case but with many (100-150 in most cases, up to 1000 if we choose very small test-period).

That way we can verify our system + our optimisation-methodology on many, many independent test-cases, which is THE reason why we want to use WFA instead of every other analysis-method described in here.

Pro:
[ul]
[li] For our final analysis-report, we only take into account the green test-results, as they are the “unseen future” relative to the red optimisation-windows.
[/li]That way, we simulate the same process we would face during live trading: Optimisation on the past, trading on the (relative) future!

That allows us to draw meaningfull answers to the initial question, as we only analyse performance in “the future”.

[li]We use all data available for our testing
[/li]

[li]We have 100-150 independent “PAST=>FUTURE”-relationship-tests, which gives us a clue about the future performance, not the past performance!
[/li]

[li]We avoid overfitting, as we use different datasets to optimise and verify our parameters
[/li]

[li]If we want to trade live, we simply make “one more step” of the WFA, optimise on the last available data (the “red” dataset would then end at the end of the chart), and then trade “in the future” (the “green” dataset would be our live trading). So we trade the system using the EXACT same methodology we have tested 100-150 times already.
[/li]

[li]Due to the frequent re-optimisation of parameters, the EA is also continuously re-adapted to the markets, which will most likely increase the overall profit.
[/li]

[li]A traditional backtest answers the question “How good was my EA in the past”, whereas a Walk Forward Analysis answers the question “How good will my EA be in the future, during live trading”.
[/li]

[li]It does not only evaluate an EA, it also evaluates the corresponding trading plan that determines how to pick the best parameters for live trading.
[/li]

[/ul]

““Contra””:

[ul]
[li]Most EAs will not pass this test. But this is not bad, because lets be honest, almost all EAs in existance are bull****. So if almost all EAs tested with this approach would give bad results, that would be great.
[/li]
Even if a lot people do not like to be disillusioned about their “holy grail money printing machines”, it’s better to face the truth during EA-development and not during live trading.

[/ul]

Contra:
[ul]
[li]There are some limitations regarding this process which will be discussed in a later article, stay tuned! Also, I am currently working on more sophisticated anlysis algorithms, but it will take a few months until I can show you something.
[/li]

[/ul]

The main advantage is that we get 100-150 independent test-cases, whereas a Backtest+Out-of-sample-test gives us only 1 test-case (or 1 datapoint).

And now, please discuss.

-Darwin